PLUS: Mistral's prover AI, Safari's new agent gateway, and the 3% AI productivity reality check

Happy reading

The release of AI models that can autonomously find software bugs has kicked off an arms race in cybersecurity. The result is a massive 3.5x spike in disclosed vulnerabilities as companies race to patch flaws before they can be exploited.

While this initial flood of disclosures is making software safer in the short term, it also signals a permanent shift in the industry. What does the future of cybersecurity look like when AI is continuously auditing and defending our digital infrastructure?

In today's Next in AI:

  • The Mythos Effect and the AI bug-finding boom

  • Mistral's bug-finding prover AI

  • Safari's new gateway for AI agents

  • The reality of AI's 3% productivity boost

The Mythos Effect

Next in AI: The release of frontier AI models that can autonomously find software bugs has triggered a massive 3.5x spike in disclosed cybersecurity vulnerabilities. Companies are now in an arms race to patch flaws before they can be exploited.

Explained:

  • Anthropic’s Project Glasswing used Claude Mythos to help partners like Google and Microsoft, and claims to have found over 10,000 high- or critical-severity vulnerabilities before the model's public release.

  • This trend isn't limited to one company, as OpenAI has launched a similar initiative called Daybreak to proactively harden critical software.

  • The surge in reported vulnerabilities, or CVEs, comes from major tech firms racing to disclose and fix the bugs discovered by these new AI systems.

Why It Matters:
This new capability creates a temporary surge of disclosures that makes software safer in the short term. Looking ahead, it signals a permanent shift in cybersecurity, where AI will continuously audit and defend our digital infrastructure.

Mistral's Prover AI

Next in AI: Mistral released Leanstral 1.5, a powerful and open-source model designed for formal mathematics that is already proving its real-world value by automatically discovering previously unknown bugs in code.

Explained:

  • Leanstral 1.5 achieves new state-of-the-art performance on advanced mathematical benchmarks, solving 587 out of 672 problems on PutnamBench and achieving 87% on the graduate-level FATE-H benchmark.

  • Using an automated pipeline that translates Rust code into the Lean proof language, the model found 5 previously unknown bugs across 57 open-source repositories by attempting to prove their correctness.

  • The model is completely open-source under an Apache 2.0 license and is extremely cost-effective, solving complex problems at an estimated cost of $4 versus competitor models that can cost over $300 for the same task.

Why It Matters: This pushes formal code verification from a specialized, academic field toward a practical tool that developers can use to increase software reliability. AI that can not only write code but also rigorously prove its correctness marks a significant step toward building more secure and robust applications.

Safari's Agent Gateway

Next in AI: Apple's WebKit team just launched the Safari MCP server, a new tool that allows AI agents to connect directly with a Safari browser window. This feature, introduced in the latest Safari Technology Preview, gives agents the ability to see and interact with web content just like a user would, streamlining development workflows.

Explained:

  • The server aims to end the tedious “debugging dance” developers face when switching between their code, the browser, and AI prompts. Instead of describing a visual bug to an agent, you can now let the agent directly access the DOM, console logs, and network requests to identify the problem itself.

  • Developers can use a suite of tools to automate tasks like analyzing site performance, checking for accessibility issues, verifying user states in a checkout flow, and ensuring cross-browser compatibility.

  • It runs entirely on your local machine and makes no network calls of its own. When an agent captures page content or screenshots, the data goes directly to the agent, not to Apple, ensuring developer privacy.

Why It Matters: This tool gives developers a much faster way to debug and test web applications in Safari by letting AI agents do the heavy lifting. It also signals a broader shift where browsers are becoming active partners for AI, moving beyond just rendering pages to becoming intelligent development platforms.

The 3% Reality Check

Next in AI: While AI demos promise massive productivity boosts, new economic data reveals the reality: AI saves the average professional about one hour per week, a mere 3% gain. More importantly, almost none of that saved time translates into higher pay.

Explained:

  • Lab studies show AI boosting task speed by 40% or more, but this effect gets diluted in a real job filled with meetings and tasks AI can't touch. For example, while AI lifted productivity by 14% for customer-support agents, the overall gain across an entire job shrinks dramatically.

  • AI performs exceptionally well on creative and structured tasks but fails poorly on work requiring nuanced business logic. Using AI outside its strengths creates confidently wrong answers that cost more time to fix than was initially saved, eating into any potential gains.

  • Despite billions in enterprise spending, the value is proving elusive. One MIT report found 95% of organizations are getting zero financial return from their AI initiatives, as saved time evaporates before it impacts the bottom line.

Why It Matters: The productivity gains from AI are real but leaky, and they don't automatically convert to profit or a raise. To benefit, you must deliberately capture that saved time and convert it into more output, new clients, or lower costs.

AI Pulse

A new report revealed that AI data centers' indirect water consumption at the power plants that supply them electricity can be up to 12 times greater than their direct on-site usage, a figure often excluded from tech giants' sustainability reports.

A report details how the U.S. federal government is actively facilitating the AI boom through a national strategy that includes government procurement, weakening environmental protections for data center construction, and providing significant tax breaks to top tech companies.

Yann LeCun argued that current Large Language Models are "not a path towards human-like intelligence" because they cannot reason or deal with real-world data, as he develops a new AI system called Joint Embedding Predictive Architecture to address these limitations.

Research shows that AI visibility tools which track brand mentions and search rankings often provide false precision, as the same prompts can produce highly inconsistent results across repeated runs.

Keep Reading