New study shows AI isn’t ready for office work



It has been nearly two years since Microsoft CEO Satya Nadella predicted that generative AI would take over knowledge work, but if you look around a typical law firm or investment bank today, the human workforce is still very much in charge. Despite all the hype about “reasoning” and “planning,” a new study from training-data company Mercor explains exactly why the robot revolution is stalled: AI just can’t handle the messiness of real work.

A reality check for the “replacement” theory

Mercor released a new benchmark called APEX-Agents, and it is brutal. unlike the usual tests that ask AI to write a poem or solve a math problem, this one uses actual queries from lawyers, consultants, and bankers. It asks the models to do complete, multi-step tasks that require jumping between different types of information.

The results? Even the absolute best models on the market—we are talking about Gemini 3 Flash and GPT-5.2—couldn’t crack a 25% accuracy rate. Gemini led the pack at 24%, with GPT-5.2 right behind it at 23%. Most others were stuck in the teens.

Why AI is failing the “office test”

Mercor CEO Brendan Foody points out that the issue isn’t raw intelligence; it’s context. In the real world, answers aren’t served up on a silver platter. A lawyer has to check a Slack thread, read a PDF policy, look at a spreadsheet, and then synthesize all that to answer a question about GDPR compliance.

Humans do this context-switching naturally. AI, it turns out, is terrible at it. When you force these models to hunt for information across “scattered” sources, they either get confused, give the wrong answer, or just give up entirely.

The “Unreliable Intern”

For anyone worried about their job security, this is a bit of a relief. The study suggests that right now, AI functions less like a seasoned professional and more like an unreliable intern who gets things right about a quarter of the time.

That said, the progress is terrifyingly fast. Foody noted that just a year ago, these models were scoring between 5% and 10%. Now they are hitting 24%. So, while they aren’t ready to take the wheel yet, they are learning to drive much faster than we expected. For now, though, the “knowledge work” revolution is on hold until the bots learn how to multitask p

Google’s Gemini to power Apple’s AI features like Siri


It’s official. Apple has chosen to work with Google, a longtime partner, to power AI features like Siri. 

“After careful evaluation, we determined that Google’s technology provides the most capable foundation for Apple Foundation Models and we’re excited about the innovative new experiences it will unlock for our users,” Apple and Google said in a statement.

The partnership confirms previous reporting on a deal with Google. Neither Apple nor Google have confirmed the price tag, but previous reports indicate Apple could be paying Google around$1 billion for access to its AI technology. The deal also comes after Apple spent some time testing the technology of competitorslike OpenAI and Anthropic. 

The multi-year partnership will involve Apple using Google’s Gemini models and cloud technology for future Apple foundational models. The deal is not exclusive, per a source familiar with the matter. Apple has historically focused on vertical integration, relying on its own hardware and software.

The iPhone-maker has faced a fair amount of public chatter criticizing it after its AI efforts, particularly its assistant Siri, lagged behind competitors. That’s not to say Apple hasn’t been quietly building powerful foundational models. The company released the first versions of Apple Intelligence in 2024, which adds AI to existing OS functions like searching for photos and summarizing notifications. Apple has also focused on privacy with its AI rollout, with much of the processing happening on-device or through tightly controlled infrastructure. Apple says it will maintain those privacy standards throughout its partnership with Google. 

The firm’s strategy has resulted in a subtle, sometimes invisible, occasionally resented form of AI – one that doesn’t have the same wow factor as ChatGPT or Gemini. It also stops short of delivering the kind of Siri overhaul many users have been waiting for.

Apple has delayed the rollout of its “more personalized Siri” voice assistant several times, but a spokesperson told TechCrunch an upgrade is coming this year. Previous reports indicate the overhauled Siri is expected to launch in the spring. 

Techcrunch event

San Francisco
|
October 13-15, 2026

Apple’s partnership with Google also comes as the search and adtech giant is in the midst of multiple antitrust lawsuits, including one that put its relationship with Apple front and center. In August 2024, a federal judge ruled that Google acted illegally to maintain a monopoly in online search by paying companies like Apple to present its search engine as the default on its devices and web browsers. Between 2021 and 2022, Google paid Apple about $38 billion to secure default search placements. 

In December 2025, Judge Amit Mehta issued his final remedies on the case, which include banning Google from entering into exclusive, default agreements like the one it had with Apple “unless the agreement terminates no more than one year after the date it is entered.”

Google’s cheaper AI Plus plan is now available in over 40 countries


Google’s new, cheaper AI Plus plan is now available in more than 40 countries, including Angola, Bangladesh, Cameroon, Côte d’Ivoire, Egypt, Ghana, Indonesia, Kenya, Mexico, Nepal, Nigeria, Philippines, Senegal, Uganda, Vietnam and Zimbabwe.

The company first launched its AI Plus plan in Indonesia earlier this month at Rp 75,000 ($4.50) per month. The plan costs around $5 in most countries, and Google says it will discount it by 50% for six months in a few locations like Nepal and Mexico.

The Plus tier unlocks access to Gemini 2.5 Pro, as well as tools for image and video creation like Flow, Whisk, and Veo 3 Fast. Users also get access to more features on the company’s AI research assistant NotebookLM, can use AI in Gmail, Docs and Sheets, and get 200GB of cloud storage.

The news comes a day after OpenAI expanded its sub-$5 ChatGPT Go plan to Indonesia. Notably, India, where OpenAI debuted ChatGPT Go, is missing from Google’s list.

Both companies offer a $20 per month base plan, but with these new, cheaper subscription tiers, they’re trying to reach more paying users in parts of the world where a $20 subscription can prove costly.

How to watch Google I/O 2025


It’s still May, which means it’s still Google time. After showing off Android’s new look at The Android Show, the company still has its developer conference to check off the list. Google I/O 2025 is scheduled to start on May 20 at 1PM ET / 10AM PT, and Engadget will be covering it live, via a liveblog and on-the-ground reporting from our very own Karissa Bell.

Google included some Gemini news in The Android Show — the AI is coming to Wear OS, Android Auto and Google TV — but artificial intelligence should still be the focus of the company’s upcoming keynote. too. Expect news about how Google is using AI in search to be featured prominently, along with some other surprises, like the possible debut of an AI-powered Pinterest alternative.

To view this content, you’ll need to update your privacy settings. Please click here and view the “Content and social-media partners” setting to do so.

The company made it clear during its Android showcase that Android XR, its mixed reality platform, will also be featured during I/O. That could include the mixed reality headset Google and Samsung are collaborating on, or, as teased at the end of The Android Show, smart glasses with Google’s Project Astra built-in.

To find out for yourself, you can watch Google’s keynote in the embedded livestream above or on the company’s YouTube channel. The event starts at 1PM ET on May 20 and the company plans to hold breakout sessions through May 21 on a variety of different topics relevant to developers.