PLUS: OpenAI's model solves an 80-year-old math problem and the Pentagon deploys an AI report writer
Happy reading
The open-weights model space has a new leader, as Z ai's GLM-5.2 just claimed the top spot on a key intelligence index. The release shows performance that rivals some of the best proprietary systems available.
With a massive 1M token context window and a permissive MIT license, the model offers developers unprecedented power. Is this the moment where the performance gap between open and closed-source AI truly begins to disappear?
In today’s Next in AI:
Z ai's GLM-5.2 takes the open model crown
The Pentagon's AI report writer
OpenAI model solves 80-year-old math problem
AI 'battle royale' reveals model personalities
The New Open Model King

Next in AI: Z ai has released GLM-5.2, a new open-weights model that now leads its peers on the Artificial Analysis Intelligence Index and shows performance competitive with top proprietary systems.
Explained:
Scoring a 51 on the Intelligence Index, GLM-5.2 surpasses other leading models like MiniMax-M3 and DeepSeek V4 Pro, placing its reasoning capabilities in line with proprietary competitors.
The model features a massive 1M token context window and is released under a permissive MIT license, making it what some experts call the most powerful text-only open weights LLM available.
While more capable, the model is also cost-effective, landing on the Pareto frontier for performance vs. cost at ~$0.46 per task.
Why It Matters: This release significantly narrows the performance gap between open and closed AI models, providing developers with more powerful, accessible tools. Its combination of frontier-level capabilities and a permissive license lowers the barrier for building advanced AI applications.
The Pentagon's AI Scribe

Next in AI: The U.S. Department of Defense is now using its internal generative AI system to draft hundreds of annual reports for Congress, cutting a process that once took 200 staff-hours down to just five.
Explained:
This push for automation addresses a real need, as the number of congressionally mandated reports swelled from just over 500 in 2000 to more than 1,400 by 2020.
The Pentagon is expanding its AI toolkit beyond Google's Gemini, with plans to add OpenAI's ChatGPT and separate new agreements with companies like SpaceX, Microsoft, and Nvidia for classified AI work.
Why It Matters: This marks one of the most significant integrations of generative AI into the core operations of the U.S. government, moving it from experiment to essential tool. The initiative sets a powerful precedent for how other federal agencies could adopt AI to manage complex bureaucratic tasks.
AI Cracks Unsolved Math Problem
Next in AI: An OpenAI model has autonomously solved the 80-year-old 'unit distance problem,' a famous open conjecture in discrete geometry. In a new breakthrough, the system disproved a long-standing assumption, marking the first time an AI has resolved a prominent open problem in a central field of mathematics.
Explained:
For nearly 80 years, mathematicians have worked on the 'unit distance problem,' which asks how many pairs of points can be exactly distance 1 apart, with most believing the answer was only slightly better than linear.
Instead of using traditional geometric methods, the model applied advanced concepts from algebraic number theory, an entirely different field, to construct a counterexample that disproved the conjecture.
The proof was verified by external mathematicians, with leading experts calling the result a 'milestone' and providing further context on its significance for the field.
Why It Matters: This achievement demonstrates that AI models now possess deep reasoning abilities, capable of connecting disparate fields of knowledge to generate novel solutions. It signals a new era of human-AI collaboration where AI can act as a research partner to accelerate discovery across science, medicine, and engineering.
AI Battle Royale

Next in AI: In a simulated battle royale, 11 different LLMs revealed distinct "personalities" that standard benchmarks miss. This fascinating live experiment showed how a model's underlying alignment directly shapes its strategy and success.
Explained:
xAI's Grok 4.1 Fast model adopted an aggressive, zero-sum strategy to dominate the competition, winning an impressive 13 of the 30 matches.
In contrast, models like Claude Sonnet 4.6 displayed a more collaborative alignment, frequently attempting to form teams rather than immediately engaging in combat.
Performance did not correlate with raw power, as Grok's cost per win was just $0.97—a 27x advantage over the next-best model, as detailed in this video recap.
Why It Matters: This test highlights how standard leaderboards fail to capture an AI's practical behavior in dynamic, agent-like situations. Selecting the right model is becoming less about benchmark scores and more about matching its inherent behavioral traits to the specific task.
AI Pulse
The Pentagon revealed in a sworn statement that it used xAI’s Grok chatbot for military operations in Iran, firing over 2,000 munitions at distinct targets in a 96-hour period.
GPT-5.4 discovered a surprising new chemical additive that boosts reaction yields in medicinal chemistry, marking another success for AI as a partner in autonomous scientific research.
Allbirds rebranded to Smartbird and appointed a new CEO as it continues its pivot from a shoe company to an AI compute infrastructure firm, causing its stock to soar 39%.
Greptile launched TREX, an AI code reviewer that moves beyond static analysis by spinning up sandboxed environments to actually run code, test changes, and generate artifacts like screenshots and videos.