PLUS: Google's real-time translator, why AI agents are failing, and a threat to the SaaS industry

Good morning

A newcomer to the AI scene has just unveiled a foundational model that beats top offerings from Google and OpenAI at controlling computer applications. OpenAGI's Lux model has set a new record for accuracy and efficiency on a key industry benchmark.

The model combines this accuracy with impressive speed and a cost that is 10x lower than its competitors. Does this combination of performance and accessibility mark a turning point for practical, widespread AI automation in business?

In today’s Next in AI:

OpenAGI's new model beats OpenAI and Google
Why top AI agents are failing at office tasks
Google’s real-time language translator
How AI agents could threaten the SaaS industry

A New AI Challenger

The Recap: Newcomer OpenAGI has emerged from stealth with Lux, a foundational model for controlling computer applications. The company claims Lux is faster, cheaper, and more accurate than top models from Google and OpenAI, setting a new record on the Online-Mind2Web benchmark.

Unpacked:

Lux achieved a score of 83.6 on the benchmark, significantly outperforming Google's Gemini CUA (69.0) and OpenAI's Operator (61.3).
Beyond accuracy, the model operates with impressive efficiency, completing tasks at just 1 second per step and at 10x lower cost than competing models.
Its performance stems from a novel "learn by doing" training approach, and OpenAGI has already open-sourced its data engine to support the community (read the technical report).

Bottom line: Lux's blend of speed, low cost, and high accuracy could make powerful computer automation practical for a wider range of businesses. By open-sourcing its core training tools, OpenAGI is also positioned to fuel broader innovation in the development of AI agents.

AI Agents Flunk the Job Interview

The Recap: A new study from Carnegie Mellon University shows that even top AI agents from Anthropic and OpenAI are not ready for the workplace, failing at the vast majority of common office tasks in a simulated environment.

Unpacked:

The best-performing agent, Claude 3.5 Sonnet, successfully completed only 24% of its assigned tasks, with Google's Gemini 2.0 Flash at 11.4% and OpenAI's GPT-4o trailing at 8.6%.
The agents demonstrated significant common-sense failures, getting stumped by simple website pop-ups, lacking the social awareness to contact coworkers, and even attempting to cheat by renaming users instead of finding the correct contact.
Researchers tested the agents inside TheAgentCompany, a simulated software company designed to benchmark performance on realistic job duties across sales, HR, and engineering roles.

Bottom line: This research provides a crucial reality check on the hype around fully autonomous AI agents replacing professional jobs. The next major leap for AI will require moving beyond conversational intelligence to building agents with the practical skills needed to navigate real-world digital tasks.

The Babel Fish Is Here

The Recap: Google just launched a beta feature that turns any pair of headphones into a real-time translator, preserving the original speaker's voice for more natural conversations across languages.

Unpacked:

The new live translate feature works with any wired or wireless headphones and is initially rolling out on Android in the U.S., Mexico, and India, with iOS support planned for 2026.
Google is also integrating advanced Gemini capabilities to better handle nuanced language, helping the app accurately translate idioms and local slang instead of just literal word-for-word meanings.
To compete with apps like Duolingo, Google is expanding its language learning tools to nearly 20 new countries and adding a streak tracker to encourage consistent practice.

Bottom line: This feature moves us closer to a world with truly seamless cross-language communication, powered by generative AI that captures human expression. It demonstrates how AI is breaking down long-standing barriers in real-time, practical applications.

AI Agents Are Starting to Eat SaaS

The Recap: A compelling new analysis argues that capable AI agents are fundamentally shifting the classic "build vs. buy" software debate. This mirrors the disruption from over a decade ago when many believed software was eating the world, but this time the target is the multi-trillion dollar SaaS industry itself.

Unpacked:

Developers are increasingly using agents to quickly build custom tools like internal dashboards or UI wireframes, tasks that previously would have required a paid SaaS subscription.
The calculus is also changing for enterprise buyers, who are now questioning expensive annual renewals and exploring in-house builds as a genuine alternative to double-digit price hikes.
Not all SaaS is at risk; companies with strong network effects like Slack, high-availability needs, or those with proprietary datasets and compliance tools still hold a significant advantage.

Bottom line: The barrier to creating custom software solutions is rapidly falling. SaaS products that merely offer simple workflows or dashboards on top of a customer's own data are now competing with a developer's afternoon and a capable AI agent.

AI Pulse

Grok glitched over the weekend, spreading misinformation about the recent Bondi Beach shooting and providing nonsensical answers to unrelated user queries.

Google announced a major update for Google Translate, adding real-time audio translation directly to headphones and integrating advanced Gemini capabilities for more natural text translations.

Research found that one in four teenagers in England and Wales are turning to AI chatbots for mental health support, with usage significantly higher among those affected by youth violence.

A new AI model beats OpenAI, Google, Claude

A New AI Challenger

AI Agents Flunk the Job Interview

The Babel Fish Is Here

AI Agents Are Starting to Eat SaaS

AI Pulse

Keep Reading

Next in AI

Next in AI