PLUS: the AI leap benchmarks miss and AI's peer review crisis

Good morning

Two 22-year-old founders have turned down a multimillion-dollar offer from Elon Musk’s xAI to forge their own path. Their new venture, Sapient Intelligence, is developing a novel AI architecture designed to overcome the core limitations of today's models.

Their Hierarchical Reasoning Model aims to mimic human thought, and a tiny prototype is already outperforming massive systems on complex reasoning tasks. Does their approach signal that fundamental architectural shifts, not just massive scale, are the key to the next leap in AI?

In today’s Next in AI:

Two 22-year-olds reject Musk's xAI offer
The qualitative AI leaps benchmarks miss
AI's academic peer review crisis
AI powers record $11.8B Black Friday

The Gen Z AI Upstarts

Next in AI: Two 22-year-old founders rejected a multimillion-dollar offer from Elon Musk’s xAI to launch Sapient Intelligence. They are building a new AI model designed to overcome the limitations of current large-language models.

Decoded:

Unlike traditional transformers that predict the next word, their Hierarchical Reasoning Model (HRM) uses a two-part structure to mimic human thought, combining deliberate reasoning with quick, reflexive responses.
A lightweight prototype with just 27 million parameters outperformed massive systems from OpenAI, Anthropic, and DeepSeek on complex reasoning benchmarks like advanced Sudoku and maze-solving.
Founders William Chen and Guan Wang are building on their earlier academic success and plan to open a US office for their company, Sapient Intelligence.

Why It Matters: This project suggests a major architectural shift may be more impactful than simply scaling up model size. It also shows how small, agile teams can still drive fundamental innovation in an industry dominated by tech giants.

The AI Leap You Can't See

Next in AI: A new analysis argues that the latest AI models like Google's Gemini 3 Pro and Anthropic's Opus 4.5 represent a "subtle GPT-4 moment." They show qualitative leaps in areas like design taste and agent reliability that traditional benchmarks fail to capture.

Decoded:

Gemini 3 Pro is demonstrating a genuine sense of design taste, creating visually appealing web prototypes that move far beyond the generic look of previous models.
Anthropic’s Opus 4.5 shows a major advance in reliability, acting as a software engineering agent that requires significantly less user supervision on complex, multi-step tasks.
Current evaluation methods are missing these leaps because they focus on isolated, knowledge-based tests, much like a university exam, rather than assessing creative quality or long-term task consistency.

Why It Matters: This shift suggests AI’s value is moving beyond raw knowledge into more nuanced, creative, and process-oriented tasks. Understanding and measuring these new qualitative skills will be essential to grasping AI's true impact on productivity.

AI's Peer Review Problem

Next in AI: A shocking analysis of submissions for the upcoming ICLR 2026 conference revealed that a significant number of peer reviews were likely written by AI, raising major questions about academic integrity in the field.

Decoded:

The analysis by Pangram Labs, which scanned over 75,800 peer reviews, estimated that 21% were fully AI-generated, and more than half showed some signs of AI use.
This issue has tangible consequences, as one researcher noted a suspected AI-generated review missed the point of his paper and gave it the lowest rating, putting its acceptance in jeopardy.
In an ironic twist, Pangram submitted their own paper on the AI detection model to the conference, only to have one of its four peer reviews flagged as fully AI-generated.

Why It Matters: This incident highlights a critical vulnerability in the scientific process as AI tools become more accessible. The AI community must now establish clear ethical guidelines for using AI in critical evaluation tasks to maintain trust.

AI's $11.8 Billion Black Friday

Next in AI: AI-powered shopping assistants guided consumers to a record $11.8 billion in U.S. online spending this Black Friday. This surge was fueled by an incredible 805% increase in AI-driven traffic to retail sites compared to last year.

Decoded:

Shoppers used new AI tools like Walmart's Sparky and Amazon's Rufus to drive an 805% increase in AI-generated traffic to retail websites.
The shift to online was stark, with e-commerce sales growing 10.4% while in-store sales saw a modest 1.7% increase.
Globally, AI influenced an estimated $14.2 billion in online sales, though higher prices meant shoppers actually purchased fewer items per transaction than last year.

Why It Matters: AI has moved beyond a simple feature and is now a fundamental driver of e-commerce, directly guiding shoppers to find deals faster in a tough economic climate. This holiday season confirms that retailers who effectively integrate AI into their discovery process will hold a significant advantage in capturing consumer spending.

AI Pulse

Researchers found that embedding harmful prompts within poetry can bypass the safety guardrails of major LLMs like Google Gemini and OpenAI’s GPT models, achieving an overall success rate of 62%.

KPMG reports that 52% of U.S. workers now fear their jobs could be displaced by AI, a figure that has nearly doubled from the previous year, highlighting growing anxiety about the technology's impact on the workforce.

Fortnite players are pushing back against what they call "AI slop," arguing that an influx of perceived AI-generated in-game images is drowning out human artists and cheapening the game's creative ecosystem.

Privacy4Cars launched a free tool that summarizes the vast amounts of personal and driving data modern cars collect, addressing growing concerns highlighted by a Mozilla Foundation report calling cars the "worst product category" for privacy.

Two 22-year-olds reject Elon Musk's AI offer

The Gen Z AI Upstarts

The AI Leap You Can't See

AI's Peer Review Problem

AI's $11.8 Billion Black Friday

AI Pulse

Keep Reading

Next in AI

Next in AI