PLUS: An AI agent finds 21 zero-days and Google is now liable for its AI's answers

Happy reading

The Trump administration is now treating advanced AI like a national security asset, placing strict export controls on Anthropic’s Mythos 5 model. This action has forced the company to immediately cut off access for all its global customers.

The decision, spurred by national security concerns, marks a major escalation in government oversight of private AI development. How will this new precedent affect the global AI landscape and the balance between innovation and regulation?

In today’s Next in AI:

US government places export controls on Anthropic's AI
An AI agent autonomously finds 21 zero-days
A German court holds Google liable for AI Overviews
NVIDIA sets a new standard for AI agent benchmarks

AI Models Under Lock and Key

Next in AI: The Trump administration has placed Anthropic's most advanced AI models, Mythos 5 and Fable 5, under strict export controls, forcing the company to cut off access for all global customers to ensure compliance.

Explained:

The move came after another company claimed it could jailbreak Mythos, prompting the Commerce Department to cite national security risks.
Commerce Secretary Howard Lutnick issued a letter subjecting the models to export controls, requiring a license for any transfer to foreign entities or locations.
This action marks a significant escalation in treating cutting-edge AI as a national security asset, placing Anthropic's models in a restrictive new licensing regime.

Why It Matters: This sudden directive creates significant uncertainty for AI labs trying to navigate U.S. government oversight while serving a global user base. It also sets a major precedent for how the government may control powerful AI technologies developed by private companies.

The AI Bug Hunter

Next in AI: An autonomous AI security agent from depthfirst has discovered 21 zero-day vulnerabilities in FFmpeg, a media library used in countless applications, showcasing AI's growing power in proactive cybersecurity.

Explained:

Unlike theoretical tools, the AI agent generated concrete, reproducible proofs-of-concept for each vulnerability it found, completing its deep analysis for about $1,000 in compute costs.
The findings are serious, including several critical buffer overflows and demonstrating a path to remote code execution from a single malicious network packet, with some bugs having been latent in the code for over 20 years.
This achievement builds on recent AI security research from teams at Google and Anthropic, but it marks a significant step forward by finding novel, critical bugs that previous advanced models completely missed.

Why It Matters: This demonstrates AI's shift from a code analysis assistant to a fully autonomous bug hunter capable of securing complex, real-world codebases. For security and development teams, this points to a future where AI agents continuously audit software supply chains at a scale and speed humans cannot match.

Google on the Hook

Next in AI: A German court delivered a landmark ruling that Google is directly liable for false statements made in its AI Overviews. The decision sets a major legal precedent that could reshape accountability for AI search tools worldwide.

Explained:

The court determined that unlike traditional search results, AI Overviews create “independent, new, and substantive statements,” making them a form of original content for which Google is responsible.
Google’s argument that users shouldn’t “blindly trust” AI-generated answers was rejected, with the court noting the tool’s utility would be significantly diminished if users had to verify every result.
In response to the preliminary injunction, Google stated it is carefully reviewing the decision and plans to appeal the ruling, which is not yet final.

Why It Matters: This ruling challenges the notion that tech companies can use disclaimers to avoid liability for their AI's outputs. It may also open the floodgates for similar lawsuits, forcing AI developers to take greater responsibility for the accuracy of AI-generated information.

Benchmarking the Agents

Next in AI: NVIDIA's new Blackwell platform is setting the pace on AgentPerf, the industry's first agentic AI benchmark, establishing a new standard for measuring the infrastructure that will power AI agents.

Explained:

Unlike simple chat requests, agentic AI workloads are complex relays, chaining together hundreds of model and tool calls to complete a single goal.
The new AgentPerf benchmark from Artificial Analysis measures how many concurrent agents a system can support, shifting focus from single-response speed to sustained, multi-step task capacity.
In its debut, NVIDIA's Blackwell platform delivered a standout performance, running 20x more agents per megawatt than the previous generation on the demanding DeepSeek V4 Pro model.

Why It Matters:
This benchmark provides a clear standard for evaluating hardware on its ability to handle complex, multi-step AI tasks, not just single queries. This focus on agent capacity and efficiency will directly influence how next-generation AI infrastructure is designed and deployed.

AI Pulse

Moonshot AI released Kimi K2.7 Code, an agentic model focused on complex software engineering tasks that improves token efficiency by 30% over its predecessor.

Niantic Spatial used billions of images from Pokémon Go players to train a 3D world model now being adapted for navigation in GPS-denied environments through a partnership with defense contractor Vantor.

Anthropic partnered with Tata Consultancy Services (TCS) to bring its Claude models to enterprise clients in regulated industries like finance and healthcare, with TCS deploying Claude to 50,000 of its own employees.

Researchers found that accusations of posting "AI slop" have increased more than tenfold on Reddit and Hacker News since 2023, functioning more as social gatekeeping than as genuine attempts to identify AI-generated content.

Trump admin blocks foreign access to Mythos 5

AI Models Under Lock and Key

The AI Bug Hunter

Google on the Hook

Benchmarking the Agents

AI Pulse

Keep Reading

Next in AI

Next in AI