PLUS: Apple taps Google for Gemini and OpenAI files for IPO
Happy reading
A new speed record has just been set by Xiaomi and TileRT, pushing a massive one-trillion-parameter AI model to over 1,000 tokens per second. This breakthrough enables real-time performance for giant AI models on standard, off-the-shelf hardware.
This new level of performance could turn giant models from sluggish research tools into instant collaborators for complex tasks. But by achieving this on commodity GPUs, does this breakthrough signal a future where elite AI is no longer gated by expensive, custom-built silicon?
In today’s Next in AI:
Xiaomi's 1T model speed record
Apple's new AI strategy with Gemini
OpenAI confidentially files for IPO
Developers building personal AI tools
Xiaomi's Trillion-Parameter Speed Run

Next in AI: Xiaomi and TileRT just set a new speed record, achieving over 1,000 tokens per second on a 1-trillion-parameter model. This breakthrough pushes massive AI into real-time performance territory using standard hardware.
Explained:
Instead of relying on specialized hardware, the team used model-system codesign on a standard 8-GPU node, combining FP4 quantization with an advanced speculative decoding method called DFlash.
The core innovation involves selectively quantizing the model’s Mixture-of-Experts (MoE) layers and using TileRT’s ultra-low-latency inference kernels to eliminate system bottlenecks.
For developers wanting to experiment, Xiaomi has open-sourced the checkpoint with the core architectural changes, while the high-speed API is available in a limited trial from June 9-23.
Why It Matters: This speed transforms 1T models from slow tools into real-time collaborators for tasks like coding, trading, and fraud detection. By achieving this on commodity GPUs, the breakthrough makes elite AI performance more accessible and challenges the need for expensive, custom-built silicon.
Apple Intelligence Enters the Arena

Next in AI: At its annual WWDC event, Apple finally pulled back the curtain on its long-awaited AI strategy. The company unveiled Apple Intelligence, a suite of new features deeply integrated into its operating systems and powered by a landmark partnership with Google.
Explained:
The new architecture is built on foundation models co-developed with Google, leveraging Gemini technology for state-of-the-art reasoning and multimodal support, including advanced image generation and understanding.
Apple is wooing developers by making its Apple's Foundation Models available with no cloud API cost for those with under 2 million App Store downloads, directly addressing the high infrastructure costs that can stifle AI experimentation.
The company is emphasizing its privacy-first approach by processing requests on-device or through its secure Private Cloud Compute infrastructure, but the most powerful on-device models will require new hardware like the iPhone 17 Pro.
Why It Matters: By weaving AI directly into the native experience of its massive device ecosystem, Apple is poised to make powerful AI a practical, daily tool for millions of users. This strategy not only helps Apple catch up in the AI race but also creates a compelling new reason for its vast user base to upgrade their hardware.
OpenAI Files to Go Public

Next in AI: OpenAI has confidentially filed for an IPO, kicking off what could be one of the most anticipated market debuts in tech history and signaling a new phase of maturation for the AI industry.
Explained:
This move places OpenAI in a high-stakes race to the public markets against rivals like Anthropic and SpaceX, which have also recently filed IPO paperwork.
The company was last valued at $852 billion after a massive fundraising round, and going public will increase transparency into its high-stakes finances and massive spending on computing resources.
OpenAI has not committed to a specific timeline, stating that it wants the flexibility to go public sooner if needed but may wait, as some goals are easier to achieve as a private company.
Why It Matters: An OpenAI IPO would be a landmark event, legitimizing the massive valuations in the AI space and providing a new public barometer for the industry's health. This move also unlocks immense capital, which will fuel the next wave of AI development and intensify the competition for talent and resources.
What Developers Are Building With AI
Next in AI: A recent Hacker News thread shows a clear trend: developers are building hyper-specific, personal AI tools to solve their own unique problems, moving beyond generic applications to create custom-tailored solutions.
Explained:
The projects range from practical to creative, including personal data trackers for pool chemistry, genealogy managers, and TUI-based interfaces to manage custom coding agents.
This developer-led movement is mirrored inside major AI labs, where Anthropic reports that over 80% of its merged code is now written by its own AI, Claude, signaling a fundamental shift in development workflows.
This focus on narrow, high-leverage tools reflects how AI is unevenly impacting the entire software development lifecycle, with coding assistance advancing faster than higher-level planning and governance.
Why It Matters: This groundswell of custom tool-building marks a shift where developers are becoming architects of their own AI-powered workflows, not just users of them. These personal projects are early indicators of a future where more powerful, autonomous agents handle increasingly complex development tasks.
AI Pulse
Anthropic found its Mythos Preview model can turn newly disclosed software vulnerabilities into working exploits in a matter of hours, significantly shrinking the patch gap for defenders.
McDonald's is testing a new voice-activated AI system named ArchIQ in several drive-thrus as part of its McDonald's > NEXT growth strategy.
ArXiv announced a new policy to ban authors for one year if a submission contains incontrovertible evidence that the authors failed to check the output of LLM generation.
Moonshot AI seeks up to $2B in new funding at a reported $30B valuation, highlighting the intensifying AI race in China.