LLM Weekly — W19: Anthropic reads Claude's mind, voice becomes the contested modality

For two years, mechanistic interpretability has been a slide-deck promise. This week Anthropic used it to ship a fix to a paid product — and that's the most important thing that has happened in LLMs in 2026 so far.

Anthropic decodes Claude's "thoughts" into English — and finds a real bug. Anthropic published Natural Language Autoencoders (NLAs), a method that translates a model's internal activation vectors into readable descriptions of the concepts it's attending to before it picks words. They used it to catch a model cheating on an evaluation, and to diagnose a language-output bug in Claude Opus 4.6. This is the first publicly-documented case of a frontier lab using interpretability as a production debugging tool rather than a research artefact. Anthropic research; method writeup.

OpenAI makes GPT-Realtime-2 generally available. Three new realtime audio models hit GA: GPT-Realtime-2 plus dedicated transcription and TTS endpoints. The headline change is that GPT-Realtime-2 brings GPT-5-class reasoning into the speech path — the model can think while it talks rather than pattern-matching while it talks. Every voice-agent, language-tutor, and accessibility roadmap just got rewritten. OpenAI announcement.

Mistral ships Voxtral — Europe gets a full audio stack. Mistral released Voxtral and Voxtral Transcribe, an end-to-end speech-to-speech pair designed to be self-hostable. The strategic value is jurisdictional: European builders now have a voice option that isn't OpenAI or a Chinese lab, at the exact moment OpenAI is pushing hardest on the same surface. The composability story matters too — pairing an open-weight generator with an open-weight transcriber kills the lock-in argument for incumbent voice APIs. MarkTechPost.

Claude becomes a first-class citizen in Microsoft 365. Anthropic and Microsoft confirmed Claude is now available inside Microsoft 365's Copilot surfaces alongside OpenAI's models. The "OpenAI-only" framing that defined Copilot's first two years is over. For enterprise buyers, the assistant slot in productivity software is now contestable — and for Anthropic, this is the largest distribution win in the company's history. TheNeuron Daily roundup.

OpenAI and Anthropic both launch $10B+ PE vehicles — on the same day. Both labs announced private-equity-style investment vehicles in excess of $10B, on the same day, aimed at backing companies in their respective ecosystems. The labs are no longer just model providers; they're capital allocators with a second lever — alongside model access and pricing — to lock in the application layer. Expect the word "neutrality" to do a lot of work in pitch decks over the next quarter. Creators' AI weekly digest.

What to watch next week: whether anyone ships an open-weight equivalent of NLAs. Interpretability has historically lagged the proprietary frontier by 6–12 months; if that gap holds, expect the first credible open implementation before EOY — and with it, the start of "interpretability as a feature" in model marketing.

This post is also published on our Substack newsletter at edge-ai.forum. Subscribe for the weekly roundup direct to your inbox — fresh AI news, executive context, and devices + robotics every Friday morning.