PLUS: Mistral’s trustworthy code agent, Pokémon-powered robots, and AI’s hardest test yet

Happy reading

NVIDIA is expanding far beyond its GPU dominance, unveiling its first-ever purpose-built processor for AI agents at its GTC conference. The new Vera CPU is designed specifically to handle the complex reasoning and planning tasks at the heart of autonomous AI systems.

The new chip is engineered for massive scale, aiming to power entire AI factories running thousands of agentic tasks at once. With foundational hardware like this now available, are we about to witness the transition from experimental AI agents to widespread, real-world deployment?

In today’s Next in AI:

NVIDIA's new purpose-built CPU for AI agents
Mistral’s trustworthy code agent
Pokémon Go data powers delivery robots
A new exam to test AI's true limits

NVIDIA’s Agentic CPU

Next in AI: At its GTC conference, NVIDIA unveiled the Vera CPU, its first processor designed specifically for the complex reasoning and orchestration tasks of AI agents. This move signals a major expansion beyond GPUs to power the next wave of autonomous AI.

Explained:

This marks a strategic shift toward purpose-built CPUs for coordinating the planning, reasoning, and tool-use functions central to AI agents. Vera delivers twice the efficiency and is 50% faster than traditional CPUs for these workloads.
NVIDIA built for massive scale with a new rack system that integrates 256 liquid-cooled Vera CPUs. This setup can sustain over 22,500 concurrent AI environments, allowing AI factories to run tens of thousands of agentic tasks at once.
Major tech players are already on board, signaling wide industry adoption. Partners include cloud providers like Oracle and Meta, and hardware manufacturers such as Dell, HPE, and Supermicro, with systems expected in the second half of this year.

Why It Matters: NVIDIA is building the foundational hardware for a future where AI agents autonomously perform complex tasks, moving far beyond simple chatbots. This positions the company to dominate not just AI model training with GPUs, but also the large-scale deployment of agentic systems with specialized CPUs.

Mistral’s Prover AI

Next in AI: Mistral AI has released Leanstral, the first open-source code agent designed to automatically generate and formally prove code for high-stakes domains like mathematics and mission-critical software.

Explained:

Leanstral is built for efficiency and access, using a sparse architecture with 6B active parameters. It’s available under an Apache 2.0 license with free API access and downloadable weights for developers.
The model offers competitive performance at a fraction of the cost of its rivals. In benchmarks, it achieves a higher score than Claude Sonnet while costing just $36 versus Sonnet's $549 for the same evaluation.
Unlike typical code generators, Leanstral works with the Lean 4 proof assistant. This allows it to not just write code but to also formally prove their implementations against strict specifications, ensuring a high degree of trust and correctness.

Why It Matters: This approach directly addresses one of the biggest bottlenecks in AI development: the time and expertise required for human review. It marks a significant step toward a future where AI agents can build complex, verifiably correct systems with much less manual oversight.

Pokémon Powers Robots

Next in AI: Niantic, the company behind Pokémon Go, announced a partnership with Coco Robotics to help delivery robots navigate city streets. The system is powered by a massive dataset of 30 billion images collected by Pokémon Go players over the last decade.

Explained:

The data comes from players who, often unknowingly, mapped real-world locations by scanning landmarks and statues for in-game rewards, capturing them from countless angles and in various lighting and weather conditions.
Niantic uses this data to power its Visual Positioning System (VPS), which allows robots to pinpoint their location with centimeter-level accuracy by recognizing buildings and objects, overcoming the signal issues that affect GPS in dense urban areas.
This technology is already being deployed at scale, with Coco Robotics operating around 1,000 robots for partners like DoorDash and Uber Eats and having completed over half a million deliveries to date.

Why It Matters: This creatively repurposes a massive crowdsourced entertainment dataset to solve a complex real-world logistics problem. It also provides a blueprint for building a continuously updated “living map” of the world, improved in real-time by autonomous agents moving through it.

AI's Hardest Test

Next in AI: A global team of nearly 1,000 experts developed Humanity's Last Exam, an exceptionally difficult new benchmark created to measure the true limits of today's most powerful AI models.

Explained:

The exam was created because standard benchmarks like the MMLU no longer challenge top AI models, which had begun to ace them.
Researchers ensured the exam's difficulty by removing any question that a current AI model could answer correctly, guaranteeing it tests for knowledge beyond their current reach.
Early results show even top models struggle, with GPT-4o scoring just 2.7% and the most capable systems reaching only 40-50% accuracy.

Why It Matters:
This new benchmark provides a much clearer picture of what advanced AI can and cannot do, guiding us toward building safer and more reliable systems. It also underscores that deep, specialized human knowledge remains distinct from AI's current capabilities, revealing a wide gap that still needs to be closed.

AI Pulse

Meta created a new applied AI engineering team with a 50-to-1 employee-to-manager ratio, a radically flat structure designed to accelerate its superintelligence efforts.

Roche expanded its collaboration with NVIDIA, deploying over 3,500 Blackwell GPUs to build AI factories for accelerating drug discovery, diagnostics, and manufacturing.

Teenagers sued Elon Musk's xAI after its Grok chatbot was allegedly used to create and share pornographic deepfake images of them without their consent.

Voygr launched a new YC-backed maps API that provides real-world place intelligence for AI agents, aiming to solve the problem of stale location data in existing services.

NVIDIA's new purpose-built CPU for AI agents

NVIDIA’s Agentic CPU

Mistral’s Prover AI

Pokémon Powers Robots

AI's Hardest Test

AI Pulse

Keep Reading

Next in AI

Next in AI