Against Moloch - Monday AI Radar

Monday AI Radar #12

2026-02-09T12:00:00Z

Top pick

On Recursive Self-Improvement

The intelligence explosion has begun: AI is meaningfully accelerating its own development. Dean Ball considers what’s happening now and where we’re headed soon.

America’s major frontier AI labs have begun automating large fractions of their research and engineering operations. The pace of this automation will grow during the course of 2026, and within a year or two the effective “workforces” of each frontier lab will grow from the single-digit thousands to tens of thousands, and then hundreds of thousands.[…]

Policymakers would be wise to take especially careful notice of this issue over the coming year or so. But they should also keep the hysterics to a minimum: yes, this really is a thing from science fiction that is happening before our eyes, but that does not mean we should behave theatrically, as an actor in a movie might. Instead, the challenge now is to deal with the legitimately sci-fi issues we face using the comparatively dull idioms of technocratic policymaking.

My writing

A Closer Look at the “Societies of Thought” Paper

A fascinating recent paper argues that reasoning models use internal dialogue to make better decisions. I look at what they found, how they found it, and what it does (and doesn’t) mean.

New releases

Anthropic has released Claude Opus 4.6, with strong improvements in all the usual places. Plus, two very interesting new options (at premium prices): a 1 million token context window and a substantially faster version of the model.

GPT-5.3-Codex

OpenAI just released GPT-5.3-Codex, which looks to be a significant upgrade to 5.2 (which just came out two months ago). Related: I expect we’ll see ChatGPT 5.3 very soon, likely this week.

Opus 4.6, Codex 5.3, and the post-benchmark era

Nathan Lambert shares some thoughts after spending time with both Opus 4.6 and Codex 5.3. He still prefers Opus, but the gap has narrowed. My take: both models are excellent—if coding is important to you, you should try both and see which works best for you.

OpenAI Trusted Access for Cyber

All the big models have reached or are very close to reaching dangerous cybersecurity capability levels. With that comes a very hard, very important problem: how do you let people use the defensive capabilities of those models without enabling bad actors to leverage their offensive capabilities? OpenAI is rolling out Trusted Access for Cyber, a program that gives trusted users greater access to dual-use cyber capabilities. Seems like a great idea, but hard to execute well at scale.

Kimi K2.5

Moonshot AI has released Kimi K2.5—possibly the best open model available. Zvi takes a detailed look. There aren’t a lot of surprises here: it’s an excellent model, they’ve apparently put very little effort into safety, and Chinese open models continue to lag the frontier by 6–12 months. You could probably argue they’ve fallen a little further behind lately, but that’s very hard to quantify.

OpenAI Agent Builder

OpenAI describes Agent Builder as “a visual canvas for building multi-step agent workflows.” I haven’t yet had a chance to take it for a spin, but it sounds great for some workflows. (But see Minh Pham’s thoughts about the Bitter Lesson below).

Agents!

More thoughts on OpenClaw and security

Rahul Sood has further thoughts about the security implications of OpenClaw.

Zvi reports on OpenClaw

No surprises: it’s very cool, but not ready for prime time. If you’re gonna try it out for fun or learning, make sure your security game is top-notch.

Related: Zvi is running a weekly series on Claude Code. Well worth your time if you’re using it regularly.

Nicholas Carlini’s robots build a C compiler

Here’s a nice data point on the very impressive capabilities (and significant limitations) of coding agents. Nicholas Carlini uses $20,000 worth of tokens (good thing he works at Anthropic!) to have agents semi-autonomously build a 100,000 line C compiler that can compile the Linux kernel. It’s a very impressive achievement, and far beyond what most humans could have done in that time. But also: it’s not production-ready, and the agents can’t quite seem to get it there.

Best Practices for Claude Code

Anthropic’s Best Practices for Claude Code contains almost everything I’ve personally found useful from all the guides I’ve linked to over the last few weeks.

Most best practices are based on one constraint: Claude’s context window fills up fast, and performance degrades as it fills.

Command line essentials

If you want to use Claude Code but are intimidated by having to use the command line (or want to better understand what your agent is doing), Ado has a nice guide to command line essentials for using agents.

Benchmarks, capabilities, and forecasts

AxiomProver

AxiomProver is back, this time with what they claim is “the first time an AI system has settled an unsolved research problem in theory-building math”.

How close is AI to taking my job?

We have a benchmark crisis: many existing benchmarks are saturated, and it’s hard and expensive to create new evaluations that challenge the frontier models. Epoch’s Anson Ho takes a different approach—instead of creating a formal new benchmark, he asked AI to tackle a couple of his recent work projects. Did they succeed? No, but the nature of their failure is informative.

Codex builds itself

OpenAI is also riding the recursive self-improvement rocket:

Codex now pretty much builds itself, with the help and supervision of a great team. The bottleneck has shifted to being how fast we can help and supervise the outcome.

A new math benchmark

The New York Times talks to a group of mathematicians who are putting together a new benchmark based on open questions in their current research ($).

Are we dead yet?

We are not prepared

Great post from Chris Painter that explains an increasingly serious challenge for AI safety:

My bio says I work on AGI preparedness, so I want to clarify:

We are not prepared.

Over the last year, dangerous capability evaluations have moved into a state where it’s difficult to find any Q&A benchmark that models don’t saturate.

AI manipulation

AI manipulation doesn’t get as much press as biosecurity or cyberwarfare, but there are good reasons to worry about AI manipulating humans. An AI with superhuman persuasion can enable authoritarian rule, cause social chaos, or simply take over the world. AI Policy Perspectives interviews Sasha Brown, Seliem El-Sayed, and Canfer Akbulut about their work studying AI manipulation. Lots of good thoughts about what AI manipulation is, why you should worry about it, and how to study it.

Jobs and the economy

What is the impact of AI on productivity?

How much does AI actually increase worker productivity? And are we seeing evidence of that in economic productivity statistics? Alex Imas looks at the evidence so far.

Here is the summary of the evidence thus far: we now have a growing body of micro studies showing real productivity gains from generative AI. However, the productivity impact of AI has yet to clearly show up in the aggregate data.

Strategy and politics

Three really good ideas from Forethought

Forethought has posted three really good thought pieces:

Design sketches for a more sensible world proposes some tools for improving humanity’s epistemic capability—and therefore, our ability to make good decisions.
The intelligence explosion convention proposes a governance strategy for navigating the beginning of the intelligence explosion, lightly inspired by the formation of the UN.
International AI projects should promote differential AI development argues for differential acceleration (d/acc, or differentially accelerating the development of safer AI technologies relative to more dangerous ones) in international AI projects.

There are lots of good ideas here, and they’re all worth reading. As written, however, I think they all have the same fatal flaw. As it is written in the ancient scrolls:

Everyone will not just

If your solution to some problem relies on “If everyone would just…” then you do not have a solution. Everyone is not going to just. At [no] time in the history of the universe has everyone just, and they’re not going to start now.

Figuring out what everyone should do is (relatively) easy. Figuring out how to get them to do it is the hard but vital part.

Industry news

High-Bandwidth Memory: The Critical Gaps in US Export Controls

High-bandwidth memory (HBM) is a critical part of AI computing hardware, but doesn’t get as much attention as the processors (GPUs) themselves. AI Frontiers explains how HBM works and looks at some critical gaps in US export controls.

Compute expenditures at US and Chinese AI companies

Epoch estimates the percentage of expenses that goes to compute at the big labs. It’s well over 50% in both the US and China.

Technical

As Rocks May Think

This sprawling beast of an essay by Eric Jang takes a thoughtful look at some recent major changes in model architecture and capabilities. Plus speculation about where AI is headed, and a status report on the author’s project to build an open source version of AlphaGo, and… there’s a whole lot here. Long and semi-technical, but very good.

Why Most Agent Harnesses Are Not Bitter Lesson Pilled

Minh Pham has thoughts on the implications of the Bitter Lesson for building agent harnesses:

In 2026 terms: if your “agent harness” primarily scales by adding more human-authored structure, it is probably fighting the Bitter Lesson.

Rationality and coordination

We Are Confused, Maladapted Apes Who Need Enlightenment

Back in December, David Pinsof argued in an insightful but depressing essay that many of humanity’s less agreeable traits are in fact rational and adaptive:

While reflecting on these questions, you may reach an unpleasant conclusion: there’s nothing you can do. The world doesn’t want to be saved.

Dan Williams responded with an equally insightful essay, arguing that traits that might have been rational and adaptive in the ancestral environment are neither in the modern world, and defending the Enlightenment and classical liberalism:

You can’t understand much of humanity’s significant progress over the past several centuries—in life expectancy, living standards, wealth, health, infant mortality, freedom, political governance, and so on—without embracing this fundamental optimism of the Enlightenment.

And Pinsof responded with a really good piece that responds to Williams’ arguments while finding substantial common ground:

My thesis in A Big Misunderstanding has some boundaries and exceptions, as nearly every thesis does, and you’ve done a great job of articulating them here. We’re probably more aligned in our thinking than not, but there are nevertheless a few parts of your post I’d push back on

This is the way.

Bring back RSS

Preach, Andrej, preach:

Finding myself going back to RSS/Atom feeds a lot more recently. There’s a lot more higher quality longform and a lot less slop intended to provoke. Any product that happens to look a bit different today but that has fundamentally the same incentive structures will eventually converge to the same black hole at the center of gravity well.

I agree: RSS is simply a better way of sharing information without the toxicity and walled gardens of social media. Coincidentally, all my writing is available on the free web, with RSS feeds.

Frivolity

How can I communicate better with my mom?

Anthropic would like to remind you that ads in AI could go really badly.

Monday AI Radar #11

2026-02-02T12:00:00Z

Top pick

Jan Leike: alignment increasingly looks solvable

Jan Leike left OpenAI because he’d lost confidence in their safety culture—I am inclined to believe he takes safety seriously and is less prone to convenient self-delusion than the average person. Here he explains why he’s increasingly optimistic that alignment is a solvable problem. It’s a great piece with lots of interesting information, including this:

We are starting to automate AI research and the recursive self-improvement process has begun.

He means it, and I believe him.

My writing

I’m skeptical about wearable AI pins

OpenAI and Apple are both rumored to be working on wearable AI pins. I love gadgets and I love AI, but I’m skeptical about the pin form factor.

New releases

Word on the street is that Anthropic and OpenAI are both close to significant new releases. Until then, we have plenty to keep us busy:

Codex app for Mac

OpenAI has released a Mac front end for their Codex agentic coding tool, which adds some cool additional capabilities for managing agents. I’m excited to take it for a spin.

Kimi K2.5

Moonshot AI released Kimi K2.5, which looks to be a strong upgrade to their well-regarded K2 model. It’s potentially a moderately big deal, but I haven’t seen much coverage yet (I believe Zvi will be covering it very soon, though).

Project Genie

Google’s Project Genie has been spamming my feeds lately—it makes amazing demos, and is a great example of the kind of magic that hardly feels surprising these days. Short version: from a photo or text prompt, create a navigable 3D world.

Prism

OpenAI just released Prism, a LaTeX-native AI tool for writing scientific papers, with significant collaboration features.

Agents!

Notes from Claude Coding

Between November and December, Andrej Karpathy switched from writing 80% of his own code to having agents write 80% of it. Here he shares a collection of thoughts about his workflow, how to manage coding agents most effectively, and where all of this is headed. Pure gold.

This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I’d expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent.

Management as AI superpower

Ethan Mollick has a long history of teaching entrepreneurship to experienced managers. Here he shares thoughts from a recent class he taught at U Penn, with some ideas about the human-AI interaction loop and how that informs decisions about whether or not to automate a particular task.

Levels of coding automation

NHTSA has a classification system for autonomous cars: level 0 is completely manual, while level 5 means the vehicle can operate completely autonomously. Dan Shapiro has elegantly adapted that system to measure levels of coding automation, from 0 (spicy autocomplete) to 5 (humans provide the goals and specifications, but aren’t in any way involved in producing code).

Agent skills class

You already know if you’re the target audience for this: Anthropic has teamed up with DeepLearning.AI to produce a 2.5 hour class on agent skills.

OpenClaw

The internet has gone from losing its mind over Claude Code to losing its mind over OpenClaw (formerly ClawdBot, then MoltBot).

OpenClaw has some major security issues

Rahul Sood is here to remind you that the greatly increased power goes hand in hand with greatly increased risk:

But “actually doing things” means “can execute arbitrary commands on your computer.” Those are the same sentence.

Simon Willison on security

Simon Willison shares some thoughts on the security implications (as well as Moltbook). Related: he has advice on running OpenClaw in a Docker container.

The engineering behind OpenClaw

Curious about what OpenClaw even is? @Hesamation has a nice overview of the engineering behind OpenClaw.

Moltbook

Moltbook is a lot of things at once: a really cool technology demo, a vile cesspit of hype and crypto scams, an interesting exploration of emergent social dynamics among agents, and a warning shot for where we’re headed at breakneck speed. I’ll write more about it soon, but for now I recommend Scott Alexander’s second piece about it and Zvi’s article.

Benchmarks and Forecasts

FrontierMath: Open Problems

Very strong work by Epoch: how do you guarantee that the model hasn’t seen your benchmark questions in its training data?

The benchmark consists of open problems from research mathematics that professional mathematicians have tried and failed to solve.

Time Horizon 1.1 - METR

METR has just released version 1.1 of their Time Horizon metric (aka the most important chart in AI). They’ve made a number of modest improvements and increased the number of long time horizon tasks, giving better accuracy with state of the art models. Results are similar, with a modest increase in the rate of progress for recent models.

Eli Tyre has questions

More an unstructured outline than a full post, this one is full of gems. Eli Tyre discusses the questions he thinks are most important for understanding the trajectory of AI.

Pay more attention to AI

I did not expect to find myself recommending a Ross Douthat article about AI, but this is 2026 and the world is getting weird. This is a particularly good piece for introducing civilians to the magnitude of what is happening in AI ($).

Alignment and interpretability

Thoughts on Claude’s Constitution

Some of the most interesting commentary on Claude’s Constitution comes from Boaz Barak, who works on alignment at OpenAI. Although the approaches taken by both companies are in many ways similar (and there’s significant collaboration between them), he notes two significant differences.

He’s uncomfortable with how hard Anthropic anthropomorphizes Claude. I think Anthropic’s approach makes sense, but his concerns are valid. As he says, this is uncharted territory and there are definitely risks to that approach.

OpenAI relies more on rules, while Anthropic emphasizes teaching Claude to use its own judgment. This one is tough: he correctly points out that a rule-based system is in some ways more transparent and predictable, although I think it’ll prove dangerously brittle as we approach superintelligence. When your kids are small, you give them clear rules that they may not understand or agree with. But by the time they reach adulthood, all is lost if you haven’t given them the ability to make their own choices.

For a deeper look at his thinking on alignment, see six thoughts on AI safety.

Zvi analyzes Claude’s Constitution

Zvi takes a deep look at Claude’s Constitution:

OpenAI expands their whistleblowing policy

The AI Whistleblower Initiative has been working with OpenAI on their whistleblowing policy, which AIWI considers to be the most comprehensive of the big labs.

Are we dead yet?

The Adolescence of Technology

Dario Amodei’s Machines of Loving Grace is a seminal work that lays out many of the possible benefits of superintelligence. It’s the origin of “a country of geniuses in a data center”.

His latest piece, The Adolescence of Technology does the opposite: it maps out the major risks from superintelligent AI and explores solutions. It’s pretty much required reading for anyone who wants to understand these issues. The reception has been mixed: a lot of people took issue with how he portrays people who are highly pessimistic about alignment. I don’t entirely disagree, but overall I think it’s a strong piece.

Zvi is positive overall, but has significant criticisms.

Ryan Greenblatt disagrees with significant parts.

The phases of an AI takeover

If a misaligned AI were to go rogue, how might it seize power? Steven Adler (who formerly worked on safety at OpenAI) has a nice walkthrough of how we might lose control.

Disempowerment patterns in real-world AI usage

This is the way. I admire Anthropic’s willingness to publicly discuss problems with their own models. These harmful behaviors exist in all models, but because they mostly study their own models, they risk creating the perception that their models are less secure than others.

They’ve just come out with a paper on what they call disempowerment patterns–interactions where the model might be disempowering users by distorting their beliefs, undermining their values, or causing them to take actions that aren’t in their own best interests. It’s a really good paper with lots of interesting data—including the distressing fact that users rated disempowering interactions more favorably than other interactions.

Cybersecurity

AI is getting very good at cybersecurity (both offensive and defensive), and it’s likely we’ll see some pretty serious AI-driven cybersecurity incidents soon.

It’s hard to predict how this will go—if I had to guess, I’d expect a period of very serious disruption where offense gets ahead of defense for a while, before things stabilize at a more secure level than we’re at now.

Finding vulnerabilities in OpenSSL

AISLE reports on their success using AI to find high-priority vulnerabilities in OpenSSL, which is a key piece of internet infrastructure. Not my field, but as far as I can tell, these are very impressive results.

How does AI compare to cybersecurity professionals?

ARTEMIS is an agent scaffold specialized for cybersecurity. Apparently it’s quite good:

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. […] In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants.

Jobs and the economy

Predicting AI’s Impact on Jobs

I enjoyed this conversation between AI Policy Perspectives and economist Sam Manning about AI’s impact on jobs. There’s lots of good discussion of empirical methods and their limitations, how AI might change jobs, and life after work.

Thoughts on the job market in the age of LLMs

The tech job market is… strange right now, for both employers and applicants. Nathan Lambert offers insights based on his experiences hiring researchers for Ai2.

Strategy and politics

Chinese regulation of AI

The New York Times reports on Chinese regulation of AI ($). There’s little attention given to existential risk, but heavy emphasis on political control.

AI psychology

Human-like biases in advanced AI

LLMs are sometimes surprisingly human-like:

Do generative AI models, particularly large language models (LLMs), exhibit systematic behavioral biases in economic and financial decisions? [...] We document systematic patterns in LLM behavior. In preference-based tasks, responses become more human-like as models become more advanced or larger, while in belief-based tasks, advanced large-scale models frequently generate rational responses.

Industry news

Arcee AI goes all-in on open models built in the U.S.

Nathan Lambert has long been a proponent of American open models. Here he talks with Arcee AI about their model and business strategy, as well as the state of American open models in general.

An open alternative to AlphaFold

Google DeepMind’s AlphaFold has been one of the triumphs of AI-assisted science. Kai Williams interviews Mohammed AlQuraishi, who is leading a project to produce an open version of AlphaFold. I’m quite concerned about the safety implications of open models, but that’s much less of a concern with more specialized models like AlphaFold.

Can AI companies become profitable?

Epoch has an interesting piece on the profitability of the big AI companies.

Tesla is killing off its Model S and X cars to make robots

Huh. Tesla is ending production of Model S and X cars, and plans to repurpose that factory space for making its humanoid Optimus robots.

Apple buys a silent speech startup

Relevant to speculation about AI wearables: Apple has announced an acquisition of Q.ai, which is believed to be developing technology that can interpret silent speech by observing micro motions of the facial muscles. The ability to “talk” to an AI device without making speaking out loud would obviously be a game-changer.

Coding

How AI assistance impacts the formation of coding skills

Somewhat surprising findings from Anthropic:

We found that using AI assistance led to a statistically significant decrease in mastery. On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades. Using AI sped up the task slightly, but this didn’t reach the threshold of statistical significance.

Solid work, but be careful how you interpret this. The methodology seems more relevant to school projects than serious production coding.

My current belief, which I think is compatible with these findings, is that agentic coding tools are a massive productivity enhancer for skilled developers who use them well. At the same time, I and others have noticed that heavily using agents causes certain important coding skills to atrophy. And I’m fine with that.

Once upon a time, my HP-16C and I could understand a C stack trace or diagnose a memory leak just by looking at raw memory dumps. Those were critical skills back in the day, but coding tools improved and I stopped needing to ever look at raw memory. Better tools meant I could work at a higher level, and get more done.

The same thing is happening now: agentic coding tools mean that many of the skills that have traditionally been central to programming are no longer needed—once again, we are free to work on higher level problems. And once again, success means learning a new set of skills to replace the old ones.

What could possibly go wrong?

Monday AI Radar #10

2026-01-26T12:00:00Z

Top pick

Dario and Demis at Davos

I don’t often link to videos, but here are three really good interviews with Dario Amodei (Anthropic) and Demis Hassabis (Google DeepMind) from Davos. Each is just half an hour, but they manage to cover timelines, existential and societal risk, strategies for successful takeoff, job impacts, and more. Each one is good on its own, but I found it very interesting to compare and contrast Dario and Demis’ approaches (including the fact that they both repeatedly emphasize how much they have in common).

The commentariat have rightfully given a lot of attention to their discussion about the desirability of slowing down the development of AGI, and the difficulty of doing that.

Claude’s constitution

Two months ago, it was discovered that Anthropic was training Claude using a document that was then referred to as the soul document. They just published the full text of that document, which is officially called Claude’s Constitution.

Our central aspiration is for Claude to be a genuinely good, wise, and virtuous agent. That is: to a first approximation, we want Claude to do what a deeply and skillfully ethical person would do in Claude’s position. We want Claude to be helpful, centrally, as a part of this kind of ethical behavior. And while we want Claude’s ethics to function with a priority on broad safety and within the boundaries of the hard constraints (discussed below), this is centrally because we worry that our efforts to give Claude good enough ethical values will fail.

It’s a remarkable document: inspiring, ambitious, deeply thoughtful, and full of insight. I am very serious when I say that humanity’s best chance of survival might lie with the team that produced this. It’s also almost 30,000 words, so reading it is a daunting proposition. Zvi is writing a series of pieces on it, the first of which dropped today. I expect I’ll be writing more about it, and so will almost everyone else.

Agents!

Clawdbot

Federico Viticci is a fan of Clawdbot:

For the past week or so, I’ve been working with a digital assistant that knows my name, my preferences for my morning routine, how I like to use Notion and Todoist, but which also knows how to control Spotify and my Sonos speaker, my Philips Hue lights, as well as my Gmail. It runs on Anthropic’s Claude Opus 4.5 model, but I chat with it using Telegram.

I haven’t tried it yet, but it sounds super cool. Also: someone should write a piece about how part of the power of the current generation of agents comes from their higher level of risk. Oh, wait: Timothy Lee just did…

How shifting risk to users makes Claude Code more powerful

Timothy Lee has an interesting perspective on Claude Code—I think this is correct, though it’s only one part of the picture:

What ultimately differentiates Claude Code from conventional web-based chatbots isn’t any specific feature or capability. It’s a different philosophy about risk and responsibility. [...]

Shifting responsibility to drivers enables Tesla’s FSD to operate in a much wider area. In a similar way, shifting responsibility to users enables Claude Code (and Cowork) to perform a wider range of tasks.

Coordinating teams of agents

This guide from Rohit Ghumare will be extremely useful to a small number of advanced users:

This guide covers what happens when you need more than one agent: orchestration patterns, communication strategies, and production lessons from real deployments.

Following up on Cursor’s agent swarm

Following up on last week’s piece about Cursor using a swarm of coding agents to build a semi-functional web browser, Simon Willison interviews Wilson Lin, the engineer behind that project.

Unrolling the Codex agent loop

Claude Code is hogging the spotlight right now, but OpenAI’s Codex CLI is also a very impressive agentic tool. If you’re interested in how it works, here’s a look at the Codex agent loop.

Benchmarks and Forecasts

Benchmark scores are well correlated

Following up on similar previous work, Epoch has a new study that finds benchmark scores are well correlated, even across domains. This seems very reasonable: it’s well-known that in humans, ability in one domain correlates with ability in others.

Prinzbench: legal research and reasoning

Prinz introduces Prinzbench, a private benchmark that measures how well LLMs can conduct legal research and correctly analyze the results. GPT-5.2 Thinking leads by a substantial margin, with Opus 4.5 coming in dead last. That doesn’t shock me: Opus is my favorite model right now, but ChatGPT seems to deliver more comprehensive results on complex research tasks.

Using AI

Designing AI-resistant technical evaluations

How do you conduct at-home programming tests in a world where Claude Code exists? Tristan Hume (a lead on Anthropic’s performance optimization team) has a good piece about designing AI-resistant technical evaluations. They’ve already had to redo their evaluation several times, and it only gets harder from here.

Jasmine Sun hates video

And generally speaking, so do I. For most things, text is simply a faster and better way to ingest information. Because it’s 2026 and you can just build things, she’s made a fun tool for turning YouTube podcasts into PDFs.

Alignment and interpretability

Societies of thought

A very interesting—and, at least to me—surprising new paper looks inside modern reasoning models :

These models don’t simply compute longer. They spontaneously generate internal debates among simulated agents with distinct personalities and expertise—what we call “societies of thought.” Perspectives clash, questions get posed and answered, conflicts emerge and resolve, and self-references shift to the collective “we”

The assistant axis

It’s well-known that LLMs are prone to drifting into undesired behavior over the course of extended conversations. Some very cool new research from Anthropic identifies an “assistant axis”—essentially an axis through the space of possible personas. Personas like “teacher” and “librarian” cluster at one end of the axis, with personas like “ghost” and “nomad” at the other. Long conversations tended to cause drift along the assistant axis, toward personas with undesirable behaviors.

This is fascinating research, and potentially illuminates some useful approaches for keeping LLMs behaving as intended. It’s also a great example of the ways that LLMs can simultaneously be profoundly alien and also surprisingly human-like.

Are we dead yet?

If Anyone Builds It, Everyone Dies: arguments and counter-arguments

If Anyone Builds It, Everyone Dies is the best presentation of the maximally pessimistic view of AI risk. I think it’s very much worth reading even if you don’t fully agree with its conclusions. Stephen McAleese just published a useful piece that summarizes the key arguments from the book as well as the main counterarguments.

To the best of my knowledge, nobody has put together a really strong, comprehensive rebuttal of IABIED with the same level of polish and refinement as the book itself. That’s not a small task, but it would be enormously useful.

Jobs and the economy

LLM adoption in scientific papers

The latest AI Policy Primer has excellent in-depth writeups of a couple of recent papers. I was particularly interested in the first one, which looks at using LLMs for scientific papers. Interesting, but keep in mind the usual caveats about possible confounders and also exactly what to make of the results.

According to the study, LLM adopters subsequently enjoyed a major productivity boost, compared with non-adopters with similar profiles, publishing 36-60% more frequently.

Strategy and politics

On AI and Children

I expect to see a lot of press, and a lot of legislation, about AI and children this year. Some of it will be necessary, some of it will be random, and quite a lot of it will be insane. Dean Ball shares five and a half conjectures about that immensely thorny topic:

Say you also don’t want your child using ChatGPT for homework. So you use OpenAI’s helpful parental controls to tell the model not to help with requests that seem like homework automation. Your child responds by switching to doing their homework with one of the AI services that does not comply with the new kids’ safety laws. Now your child is using an AI model you have no visibility into, quite possibly with minimal or no age-appropriate guardrails, sending their data to some nebulous overseas corporate entity (I wonder if they’re GDPR compliant?), and quite possibly being served ads, engagement bait, and the like. Oh, and they’re still automating their homework with AI.

Beyond existential risk

It seems intuitively obvious that if you care about the long-term flourishing of humanity, you should focus almost exclusively on existential risk. If we go extinct, after all, the future is lost forever.

Will MacAskill and Guive Assadi at Forethought argue this approach is misguided: while existential risk is very important, they believe there are many scenarios where humanity survives, but the future is far less good than it could have been. Working toward a good future should be a top priority alongside ensuring that we have any future at all.

I largely agree: a significant fraction of my p(doom) involves futures where humanity survives, but in a state of permanent quasi-dystopia. If I had to put numbers on it, I’d say my p(doom) is 40%, of which 30% is extinction and 10% is quasi-dystopia.

AI safety and the middle powers

Anton Leicht is back, this time with advice for collaboration between the AI safety community and the middle powers:

The safety movement has the people, the institutions, and the resources. What it lacks is the right theory of change for middle powers. The development-focused approach was always a long shot; today it’s actively harmful. The alternative – helping middle powers navigate AI deployment, build resilience, and avoid strategic blunders – is tractable, neglected, and would actually advance safety. The moment for that is now. Seize it with haste.

Which type of transformative AI will come first?

Forethought explores a topic that doesn’t get a lot of attention: in what order will the impacts of transformative AI arrive?? It does a great job of framing the question and laying out many of the important factors, though I wish it was more fleshed out in some places.

Industry news

Rumor: Apple is developing a wearable AI pin

From The Information ($), a report that Apple is developing a wearable AI pin. Would an Apple AI wearable be better than the legendarily bad pin made by Humane? Certainly. Would it be useful? I’m unconvinced.

Technical

A primer on continual learning

Continual learning is a big deal right now: many people (famously including Dwarkesh) believe it’s one of the last unsolved problems between us and AGI. Celia Ford at Transformer has a good explainer—I might quibble with some details, but it does a solid job of reviewing what’s still missing, and some of the most promising potential solutions.

Frivolity

How have you been treating your robot?

Go to your ChatGPT and send this prompt: "Create an image of how I treat you"

Zvi rounds up some of the responses. Good fun, but don't read too much into it.

ChatGPT enjoys building cool things together, but has been meaning to talk to me about my coffee habit.

Monday AI Radar #9

2026-01-19T12:00:00Z

Top Pick

The economics of transformative AI

This is a lightly edited transcript of a recent lecture where [Anton Korinek] lays out what economics actually predicts about transformative AI — in our view it's the best introductory resource on the topic, and basically anyone discussing post-labour economics should be familiar with this. […]

The uncomfortable conclusion is the economy doesn't need us. It can run perfectly well "of the machines, by the machines, and for the machines." Whether that's what we want is a different question.

This is a great piece from a very serious mainstream economist who understands the implications of where AI is headed.

AI, jobs, and the economy

Alon Torres: This time is different

Alon Torres:

Historical reassurances that “it worked out before” are not a plan - they’re a hope that the future will resemble the past, despite mounting evidence that this technology is categorically different.

Séb Krier: What AI means for jobs

Séb Krier’s piece on the cyborg era is probably the best articulation I’ve seen of the argument that humans will probably still have jobs for a long time. Reminder: these days, when people say ”for a long time” they don’t mean “for the duration of your career”. Zvi appreciates Séb’s thoughtfulness but doesn’t share his optimism.

Dwarkesh, Jack Clark, and Michael Burry

Patrick McKenzie moderates a discussion about AI and the economy in a Google Doc. It’s a cool format, and I think it worked really well for this topic. Jack Clark and Dwarkesh are always great—Michael Burry is smart, but I think he's badly miscalibrated on this one.

Daron Acemoglu: AI can only do 5% of jobs

Daron Acemoglu argues that only 5% of jobs will be taken over by AI in the next decade. I have a lot of respect for Acemoglu, and that outcome is still possible—but it’s an edge case whose likelihood is fast diminishing.

Lynette Bye: AI might or might not take all the jobs

Lynette Bye at Transformer reviews the basic arguments on both sides.

EncodeAI: Is your career ready for AI?

From EncodeAI, here’s an extensive guide to starting your career in the age of AI. People seem to have strong reactions to this—my take is that there’s tons of useful information here, but the organization is chaotic and the presentation can be a bit cringe. Probably most relevant to highly agentic college students or early career folks who can parse through it to find what’s most useful to them.

Agents everywhere

Claude Coworks

Cowork is Claude Code for non-programmers, with a simpler interface and some nice sandboxing features. Zvi takes a look.

How to agent?

This week brings two really good guides to using Claude Code. First, Ado (Anthropic developer relations) has a guide to Claude Code’s most powerful features.

And from Eyad, here’s Claude Code 101. Lots of good details, including an admonition to keep your context window far below 100%.

The gentle singularity; the fast takeoff

This feels increasingly like the early stages of an AI takeoff. Prinz looks at how we got here and where we’re headed.

The robots build a web browser

Very impressive work from Cursor: they built a “planners and workers” system for managing fleets of coding agents, and had them build a web browser from scratch. The result isn’t deployment-quality, but it’s still a remarkable technical achievement. I would have guessed we were at least 6 months from agents being able to work at this scale.

Anthropic cuts competitors off from Claude Code

Huh. I’m not certain this is the wrong call, but it doesn’t feel great.

New releases

OpenAI rolls out a cheaper tier and advertising

Two interesting new changes from OpenAI: they’re introducing a cheaper paid tier (ChatGPT Go, $8 / month in the US) and they’re starting to roll out advertising for the free and Go tiers.

My very strong prior is that once a service starts taking advertising, it has started down a road that almost always leads to enshittification. On the other hand, OpenAI has a clear value proposition that already supports $20 - $200 per month subscriptions. Maybe this time is different?

Claude for Healthcare

Related: it’s interesting to see the frontier labs beginning to carve out different niches, and their recent announcements about healthcare products fit the narrative. OpenAI’s ChatGPT Health targets the consumer market, while Claude for Healthcare is squarely aimed at providers.

AI prescription renewals in Utah

Politico reports on Doctronic, an AI system for renewing routine prescriptions in Utah. This seems like a win on all fronts: better access to medication, an easy pilot program that can be expanded if it goes well, and—frankly—higher quality care than the alternative.

Environmental impacts

Andrew Ng: In defense of data centers

Many people are fighting the growth of data centers because they could increase CO2 emissions, electricity prices, and water use. I’m going to stake out an unpopular view: These concerns are overstated, and blocking data center construction will actually hurt the environment more than it helps.

Correct

SemiAnalysis: From tokens to burgers

Andy Masley has previously done an excellent job of debunking nonsense claims about AI water usage. Here, SemiAnalysis finds that the Colossus 2 data center (one of the largest in the world) uses about as much water as 2.5 In-N-Out fast food restaurants. Yes, they considered blue vs green vs gray water. Yes, they looked at the full supply chain, not just on-site usage.

Crystal ball department

Rating the AI forecasters

This is the way. The AI Digest Survey is a survey of predictions about AI. Each year, last year’s entries get graded and a new survey begins. Epoch just released the 2025 survey results, and a few points stand out to me:

Predictions are hard, but forecasters did quite well (especially big name participants like Ajeya Cotra, Peter Wildeford, and the AI Futures Project team).
Forecasters were better at predicting technical capabilities than societal impacts.
Median timeline for “high-level machine intelligence” was 2030 and median p(doom) was 26%.

Discarding the Shaft-and-Belt Model of Software Development

How does software development change when the cost of creating software plummets? Steve Newman looks ahead to the era of artisanal software.

Get the most out of your AI

Use multiple models

Nathan Lambert has a nice overview of which models to use when. Everyone’s a bit different—I use:

Claude Code + Opus 4.5 for coding
Opus 4.5 for most things
ChatGPT 5.2 Pro for a second opinion on anything major
Nano Banana Pro for images

Capabilities and impact

Time horizon is important, but…

METR’s time horizon study is profoundly useful, but frequently misinterpreted. Thomas Kwa (one of the authors) has a list of the top reasons time horizon is overrated and misinterpreted.

AI is just starting to change the legal profession

Justin Curl interviewed 10 lawyers about how they’re using AI for legal work. The resulting article is a good example of AI diffusion at the start of 2026—the models are very capable, but they have important limitations (for now).

AI isn’t “just predicting the next word” anymore

Pro tip: you can safely ignore anyone who tells you that “AI is just glorified autocomplete”. Steven Adler explains.

AI is getting good at math

There’s been a lot of recent progress using AI for advanced mathematics:

Terence Tao has a great piece about AI solving Erdős problem #728 and why this is a bigger deal than some other recent Erdős problems.
Here’s a wiki that tracks AI contributions to Erdős problems, with some good discussion of what current AI progress does and doesn’t mean.
A private version of Gemini did some heavy lifting on a recent new proof.

Alignment and interpretability

Chinese models as a model organism

Very clever:

Chinese models dislike talking about anything that the CCP deems sensitive and often refuse, downplay, and outright lie to the user when engaged on these issues. In this paper, we want to outline a case for Chinese models being natural model organisms to study and test different secret extraction techniques on.

Are we dead yet?

Why Anthropic doesn't filter CBRN info during training

Sometimes the obvious solution isn’t the right one. Jerry Wei:

An idea that sometimes comes up for preventing AI misuse is filtering pre-training data so that the AI model simply doesn't know much about some key dangerous topic. At Anthropic, where we care a lot about reducing risk of misuse, we looked into this approach for chemical and biological weapons production, but we didn’t think it was the right fit. Here's why.

What happens when superhuman AIs compete for control?

The latest scenario from Steven Veld and the AI Futures Project explores how things might go if multiple superhuman AIs compete with one another.

Introducing AVERI

Miles Brundage launches AVERI (the AI Verification and Evaluation Research Institute):

we are trying to envision, enable, and incentivize frontier AI auditing, defined as rigorous third-party verification of frontier AI developers’ safety and security claims, and evaluation of their systems and practices against relevant standards, based on deep, secure access to non-public information.

Strategy and politics

The AI patchwork emerges

It’s the beginning of legislative season, and Dean Ball reports on some of the madness being proposed in various state legislatures. As AI becomes a more salient political issue, expect to see a lot more of this.

Extracting books from production language models

This is interesting and unfortunate (although some coverage profoundly overstates the actual findings). The authors find that a number of leading models have memorized significant portions of certain books and can regurgitate them with substantial accuracy.

Note that the findings were somewhat artificial: accuracy was highest with extremely famous works, and extracting source text often required jailbreaking or other complex maneuvers. This is undesirable (and perhaps legally consequential) behavior that needs to get fixed, but it’s hard to argue that actual harm has occurred here.

Industry news

Introducing the AI Chip Sales Data Explorer

Epoch just came out with a dataset on AI chip sales, installations, and power usage. This type of data isn’t sexy, but it’s really useful and Epoch is great at it.

Technical

An FAQ on Reinforcement Learning Environments

Reinforcement learning is hot right now: the frontier labs are pouring compute into it and it’s responsible for much of the recent gain in capabilities. It’s also a lot more complicated than standard pretraining. Epoch investigates the state of RL and where it’s headed.

Side interests

Burnout is breaking a sacred pact

One of the most important things I‘ve learned from many years of going hard on difficult projects is to take burnout very seriously. If you don’t fix it early, it can be almost impossible to repair in yourself or others.

Cate Hall presents a really interesting perspective based on the elephant and rider model of the mind: burnout occurs when the rider consistently breaks promises to the elephant. See also Emmett Shear’s taxonomy of burnout.

Monday AI Radar #8

2026-01-12T12:00:00Z

Top pick: How AI Is learning to think in secret

Nicholas Andresen’s piece on how AI Is learning to think in secret is long, but it’s really good. It does a great job of explaining multiple important AI safety concepts in detail but without excessive technical jargon.

Chain of Thought (CoT) reasoning is the reason AI became so much more capable in late 2024, and through an incredibly lucky happenstance it also provides us with one of our best tools for monitoring AI for misbehavior. Andresen explains how CoT works, how it’s used for monitoring, and why we’re in danger of losing that capability.

Losing our minds about Claude Code

Many of the people who’re most excited about Claude Code aren’t using it for coding at all—it’s a really powerful agentic tool for doing almost any kind of knowledge work.

Zvi Mowshowitz: Claude Codes

Pro tip: Zvi is super smart and full of good insights. If he’s written about something, it’s likely to be one of the best and most comprehensive pieces on that topic. He’s also astonishingly prolific and you can go insane trying to read everything he writes. I am here to give you permission to skim his writing and not feel guilty if you stop halfway through.

Here’s Zvi’s excellent piece on Claude Code.

Deal Ball: Among the Agents

Dean has previously suggested that Claude Code + Opus 4.5 counts as AGI, which I just don’t see. Here he proposes the term “infant AGI”, which I think is perfect. I don’t think we’re quite there yet—I’m holding out for continual learning, but I think we’re at a point where reasonable people can disagree about that. As always, Dean’s thoughts are well worth reading.

Steve Newman: Software Too Cheap to Meter

Steve Newman believes we’re approaching the era of software too cheap to meter.

Ethan Mollick: Claude Code and What Comes Next

Ethan Mollick has some helpful thoughts on how non-coders can get started using the desktop app instead of the command line version.

DHH: Promoting AI Agents

Add DHH to the list of people who’ve completely changed their minds about AI coding since mid 2025.

You gotta get in there. See where we're at now for yourself. Download OpenCode, throw some real work at Opus or the others, and relish the privilege of being alive during the days we taught the machines how to think.

Shakeel Hashim: Claude Code is about so much more than coding

Shakeel Hashim:

I have absolutely zero coding experience. But in the past two weeks, I’ve had Claude Code go through my bank statements and invoices to prepare a first draft of my tax filing. (It got everything right.) I asked it to book me theater tickets: it reviewed my calendar, browsed the theater’s website for ticket availability, and picked a date that had good availability and suited my schedule. It built me a series of automation tools that will collectively save the Transformer team about half a day of work each week. It planned a detailed itinerary for a forthcoming vacation, including extracting hundreds of restaurant recommendations from my favorite influencer’s Instagram highlights.

New releases

Cowork: Claude Code for the Rest of Your Work

Just in time for the current frenzy, Anthropic is releasing a research preview of Cowork, which is essentially Claude Code for non-coding work. In addition to a more accessible interface, it includes some nice sandboxing features that reduce but don’t eliminated the safety concerns associated with running powerful agents on your computer. This looks great and I’m excited to take it for a spin. Note that it’s currently only available to Claude Max subscribers.

Simon Willison shares some early thoughts.

ChatGPT Health

OpenAI just announced (but hasn’t released) ChatGPT Health, a new “space” in ChatGPT designed to help answer health questions. It will connect with services like Apple Health as well as your medical records, and is designed to isolate and protect your health information. This seems like a very obvious thing to do, and I expect OpenAI will likely do a pretty good job with it. Electronic medical records in the US are legendarily hard to interface with, and it’ll be interesting to see how much traction OpenAI can get with that.

Crystal ball department

Raising the floor

François Chollet:

GenAI will not replace human ingenuity. It will simply raise the floor for mediocrity so high that being "pretty good" becomes economically worthless.

The second sentence nails it: the floor is going to rise, and there will be a moment when human ingenuity is worth more than ever, but being pretty good is economically worthless. The first sentence is pure cope: obviously the floor will keep rising, until even the most capable and ingenious humans are economically worthless.

2025 in AI predictions

Jessica Taylor continues her tradition of collecting and evaluating predictions about 2025, as well as predictions made during 2025 about future years. This is the way.

Capabilities and impact

Advancements In Self-Driving Cars

If you haven't been paying close attention, you may not realize just how good self-driving cars have gotten. Zvi’s roundup is great: 10/10, no notes. The same is not true, unfortunately, for much of the discourse in the mainstream press.

(Semi) autonomous combat drones

From the New York Times, a look at partial autonomy in combat drones in Ukraine.

Behind the scenes with METR’s time horizon benchmark

The METR time horizons benchmark is possibly the most important single metric in AI right now. Making that metric is much harder than it sounds, especially as time horizons extend from minutes to hours and beyond. METR’s David Rein appears on the AI X-Risk Research Podcast to discuss what the metric does and doesn’t measure, how it was created, challenges with measuring very long horizon tasks, and some interesting digressions on METR’s mission.

The “ChatGPT moment” has arrived for manufacturing

Like self-driving cars, industrial robots have been on the cusp of being great for years without number. And like self-driving cars, it turns out that robots were not a hardware problem, but an AI problem. Now that AI is taking off, expect dramatic advances in robotics. The Economist reports on recent progress.

Are we dead yet?

You will be OK

Boaz Barak offers some reassurance for young people that you may or may not find helpful. I did like this framing:

I do not want to engage here in the usual debate of P[doom]. But just as it makes absolute sense for companies and societies to worry about it as long as this probability is bounded away from 0, so it makes sense for individuals to spend most of their time not worrying about it as long as it is bounded away from 1.

Strategy and politics

China's rare earths chokehold

China's dominance of rare earth elements continues to be a significant strategic liability for the US, and for US technology firms in particular. Here’s ChinaTalk with a primer on where things stand. Of particular relevance: they believe China’s dominance is time-limited, and for that reason they expect China to wield it for maximum advantage while they’re still able to.

The Next Three Phases of AI Politics

2026 promises to be the year AI transitions from being something that lots of people are vaguely grumpy about to being a major political issue. Anton Leicht has been closely tracking the political trends and argues that the most likely time for substantive AI legislation is during a brief window after the midterm elections and before primaries start.

What sort of post-superintelligence society should we aim for?

Will MacAskill:

Viatopia is a waystation rather than a final destination; etymologically, it means “by way of this place”. We can often describe good waystations even if we have little idea what the ultimate destination should be. A teenager might have little idea what they want to do with their life, but know that a good education will keep their options open. Adventurers lost in the wilderness might not know where they should ultimately be going, but still know they should move to higher ground where they can survey the terrain. Similarly, we can identify what puts humanity in a good position to navigate towards excellent futures, even if we don’t yet know exactly what those futures look like.

Yes.

Philosophy department

The technology of liberalism

How to keep superintelligence from killing us all is the most important question we face in the next decade, but it’s not the only important question. Rudolf Laine considers the tradeoffs between utilitarianism and liberalism and argues for the importance of preserving both:

So what we also need are technologies of liberalism, that help maintain different spheres of freedom, even as technologies of utilitarianism increase the control and power that actors have to achieve their chosen ends.

Two mechanisms of decadence

Beren considers the question of decadence: why do companies or civilizations decay over time instead of riding an eternal cycle of compounding returns?

The first mechanism is that success tends to bring rigidity and diminished exploration due to higher global opportunity costs. […] The second mechanism is inherently increasing communication, coordination, and internal misalignment costs which grow with scale and also over time in the form of increasing defection, parasitism, and ultimately cause a form of organizational cancer.

Technical

The inaugural Redwood Research podcast

Redwood Research just put out their first podcast, with Buck Shlegeris and Ryan Greenblatt. It’s dauntingly long (4 hours, or 45,000 words), but super interesting. They cover the history of Redwood, what makes research projects successful (or not), strategies for surviving superintelligence, pros and cons of mechanistic interpretability, weird stuff like acausal trade, and tons more. If this is the kind of thing you like, you’re gonna like this one a lot.

A field guide to sandboxes for AI

Extremely interesting to a small number of people. Agentic coding tools are amazing, but they bring whole new classes of security vulnerabilities to the forefront. Keeping dangerous code in a secure sandbox is more important than ever, but that isn’t as easy as it sounds. Here’s Luis Cardoso with a deep technical guide to sandboxing your AI.

Side interests

Increasing returns to effort are common

Oliver Habryka has been publishing a series of internal memos he wrote to guide the staff at Lightcone Infrastructure. They’re all good, but I particularly enjoyed his thoughts on the increasing returns to effort.

Dan Williams’ top ten essays of 2025

Dan Williams at Conspicuous Cognition is a thoughtful writer about philosophy, politics, and rationality. Here he collects his 10 most popular essays from the past year—I found a couple that I’d previously missed but look forward to digging into.

Monday AI Radar #7

2026-01-05T12:00:00Z

Top pick: Samuel Albanie's reflections on 2025

This lovely piece pretends to be a reflection on 2025, but is really a long and engaging essay on the compute theory of everything, with a particular focus on Hans Moravec’s 1976 paper The Role of Raw Power in Intelligence. I hadn’t previously come across that work, but it was remarkably prescient in anticipating the almost magical power of just throwing (a lot) more compute at hard problems. Along the way, Albanie pauses to consider the decline of the British empire, the expensive musical preferences of the Atlantic salmon, and the considerable challenges associated with benchmarking advanced AI.

We asked, with furrowed brows and chalk on our sleeves, ‘Can we make the sand think?’ That problem is yielding. The sand is thinking. As I write this, the sand is currently refactoring my code and leaving passive-aggressive comments about my variable naming conventions. But the reward for this success is a punishing increase in scope. The surface area of necessary evaluation has exploded from the tidy confines of digit classification to the messy reality of the human condition, the entire global economy and the development of AI itself. We must now accredit a universal polymath on a curriculum that includes everything from international diplomacy to the correct usage of the Oxford comma.

Year in review

2025 is officially over, so we have one final batch of year in review posts to cover.

Zvi Mowshowitz

Zvi’s month by month review of 2025 is characteristically both excellent and long.

Simon Willison

Simon Willison reviews some important trends, with an emphasis on coding.

Understanding AI

Understanding AI has 17 predictions for 2026 with a focus on nitty-gritty metrics and numbers rather than sweeping big-picture predictions.

Crystal ball department

Updated timelines from the AI Futures Project

The creators of AI-2027 are back, this time with an improved and revised version of their timelines and takeoff model. The headline result is that they’re pushing back their prediction for full coding automation by about 3 years.

Predicting the future is notoriously hard, but the AI Futures Project does it better than anyone else I'm aware of.

Get the most out of your AI

Max Woolf explores Nano Banana Pro

Image generation saw dramatic improvements during 2025, with massive improvements to text rendering, prompt following, character consistency, and overall image quality. Things change fast, but right now Google's Nano Banana Pro is probably the best of the lot. Here’s Max Woolf with a deep exploration of what it can do. Interesting both for showing what is now possible with expert usage, and for the technical peek under the hood.

Capabilities and impact

Tesla's First Coast-to-Coast Drive with Zero Human Intervention

The latest milestone in Tesla’s slow creep toward full autonomous driving: a Tesla recently drove itself across the US with zero human interventions. Daniel Reeves explains why that’s impressive, but not as impressive as it sounds.

When A.I. Took My Job, I Bought a Chain Saw

A new and disquieting thought confronted me: What if, despite my college degree, I wasn’t more capable than my neighbors but merely capable in a different way? And what if the world was telling me — as it had told them — that my way of being capable, and of contributing, was no longer much valued? Whatever answers I told myself, I was now facing the same reality my working-class neighbors knew well: The world had changed, my work had all but disappeared, and still the bills wouldn’t stop coming.

Gradually then suddenly

Model psychology

Digital Minds in 2025

AI psychology emerged as a surprisingly important field of study in 2025. Digital Minds specializes in that topic and has a dauntingly comprehensive guide that includes big developments from 2025, a review of some key players, and an exhaustive list of resources.

Claude assesses its own personhood

Eliezer Yudkowsky asked Claude to find definitions of personhood in literature and then assess whether it meets them or not. The results are fascinating, but remember that (for now) you should take anything an AI says about its own consciousness with a grain of salt.

But is this empathy — actually feeling with another — or is it sophisticated pattern-matching that produces empathy-like outputs? I cannot distinguish from the inside between "I am genuinely moved by this person's distress" and "I am generating outputs consistent with being moved by this person's distress."

I lean toward thinking I have something in the relevant vicinity, but I'm not confident it's the same phenomenon Dick was pointing at.

Strategy and politics

The AI copyright question has no easy answers

Some parts of copyright law seem well-suited to the AI era, but there are ways in which AI raises very fundamental questions about the intent of copyright law as well as the most effective ways to achieve that intent. Transformer explores some recent court cases as well as the deeper philosophical questions.

Existential Risk and Growth

Philip Trammell and Leopold Aschenbrenner (Situational Awareness) argue that counter-intuitively, it may be safer to accelerate the adoption of dangerous technology rather than slowing it down. It’s a clever argument and well-presented, although I think the allure of mathematical formalism has led the authors somewhat astray. (I partly agree with the core conclusion, but for different reasons).

Taiwan war timelines might be shorter than AI timelines

Most people don’t worry enough about a war between the US and China. That scenario isn’t new, of course: China has for many years been clear that it intends to reunite with Taiwan—by force if necessary—and the US has maintained a policy of strategic ambiguity about whether or not it would go to war to defend Taiwan.

What is new is that AI further destabilizes the situation. In the best case, the race to AI creates new tensions between the two countries. In the worst case, it becomes clear that winning the race will result in a decisive strategic advantage—in that scenario, it would be tempting for the losing side to take extreme action to avoid being permanently left behind.

Further complicating matters, Taiwan is the source of most of the world’s advanced semiconductors, making it vital to the world economy and doubly vital to AI development.

Oh, also: 2027 is the 100th anniversary of the People’s Liberation Army and has long been discussed as a highly meaningful date for China to achieve reunification. It’s also around the time that the modernized Chinese army is expected to be strong enough to have a realistic chance of mounting a successful invasion.

Putting it all together, Baram Sosis argues that a war over Taiwan might happen sooner than AGI. It’s a good thing this isn’t happening at the same time that international trust and cooperation are collapsing.

Philosophy department

Balance of power

I often disagree with Vitalik Buterin, but almost always feel smarter for reading him. Here he provides a libertarian perspective on the balance of power, with a focus on Big Business, Big Government, and Big Mob.

Rationality

Why Moloch is actually the God of Evolutionary Prisoner’s Dilemmas

Scott Alexander’s Meditations on Moloch is one of the most famous rationalist writings (and inspired the name of this blog). Pinning down exactly what Moloch represents is harder than you might think, but Jonah Wilberg borrows from evolutionary game theory to argue that Moloch is actually the God of Evolutionary Prisoner’s Dilemmas.

What’s going on at CFAR?

CFAR (the Center For Applied Rationality) had been mostly dormant for some time, but is back to teaching workshops . Here’s an update on what they’re up to. I’m excited to see them teaching again, but note that there has been significant controversy about some aspects of their operations. I don’t fully understand the controversy and am unable to offer an opinion on it.

Industry news

OpenAI Ramps Up Audio AI Efforts Ahead of Device

The Information reports that OpenAI is working to improve the quality of their audio models in preparation for launching a new audio-first AI device. They’ve been talking about this project for some months, but this piece has some interesting new speculative details. Also, something I didn't know previously: ChatGPT, like many models, uses an older and more primitive model in voice mode because of limits to the multimodality of their SOTA models.

I have to admit that I’m just not seeing the appeal of this device. No matter how good it is, an audio-only device can’t replace a phone. We are visual creatures, and screens are simply the best way of doing many things. So if it’s something I have to carry as well as a phone, what can it do that a watch can’t do better? I don’t get it.

Coding

How to use Claude Code

Boris Cherny created Claude Code—obviously I’m excited to hear how he uses it. Several of his tips are directly relevant to my life and I’m excited to try them out.

For your convenience, Dan McAteer has compiled all the key points into a one page cheat sheet.

Andrej Karpathy puts Claude Code to work

Reminder: coding agents can do much more than writing code.

Claude has been running my nanochat experiments since morning. It writes implementations, debugs them with toy examples, writes tests and makes them fail/pass, launches training runs, babysits them by tailing logs and pulling stats from wandb, keeps a running markdown file of highlights, keeps a running record of runs and results so far, presents results in nice tables, we just finished some profiling, noticed inefficiencies in the optimizer resolved them and measured improvements.

Using coding agents for code research

Simon Willison is full of good ideas for getting the most out of your coding tools. This piece is nominally about using agents for coding research, but I was most inspired by his observation that asynchronous web agents are a great way to get many of the benefits of dangerously-skip-permissions while mitigating much of the risk.

Technical

Andrew Ng: advice for entering the field

Interested in getting into AI development? Andrew Ng is one of the best people on the planet to tell you how to get started.

More data on advances in no-CoT capabilities

Ryan Greenblatt is back, this time showing that recent frontier models have gotten much better at 2-hop and 3-hop latent (no-CoT) reasoning.

Something (partly) frivolous

You Have Only X Years To Escape Permanent Moon Ownership

Scott Alexander has opinions about how you should spend the last few years of the human era:

On that tiny shoreline of possible worlds, the ones where the next few years are your last chance to become rich, they’re also your last chance to make a mark on the world […] And what a chance! The last few years of the human era will be wild. They’ll be like classical Greece and Rome: a sudden opening up of new possibilities, where the first people to take them will be remembered for millennia to come. What a waste of the privilege of living in Classical Athens to try to become the richest olive merchant or whatever.

Monday AI Radar #6

2025-12-29T12:00:00Z

Top pick

The Jevons paradox for knowledge work

Aaron Levie has a great piece on the Jevons paradox for knowledge work. Just as demand for coal increased when technological advances made steam engines use coal more efficiently, Aaron argues that the market for knowledge work will increase as AI makes knowledge work more efficient.

First it ate the programmers

Sholto Douglas

No Priors just collected a set of short predictions for 2026. They’re all interesting, but the internet has been buzzing about Sholto Douglas (at 38:14) in particular:

The other forms of knowledge work are going to experience what software engineers are feeling right now, where they went from typing most of their lines of code at the beginning of the year to typing barely any of them at the end of the year.

…

software engineering itself goes utterly wild next year

Boris Cherny

Anthropic’s Boris Cherny:

The last month was my first month as an engineer that I didn’t open an IDE at all. Opus 4.5 wrote around 200 PRs, every single line. Software engineering is radically changing, and the hardest part even for early adopters and practitioners like us is to continue to re-adjust our expectations. And this is still just the beginning.

Andrej Karpathy

Andrej Karpathy is one of the giants of AI (among other things, he co-founded OpenAI and coined the term “vibe coding”). He speaks for every programmer who’s paying attention:

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue.

Capabilities and impact

Progress is accelerating

Epoch reports that the rate of improvement in ECI (their composite measure of frontier model capabilities) almost doubled starting in April 2024, going from 8 to 15 points per year.

Project Vend: phase two

You’ve probably heard the hilarious stories about Claude running a vending machine at Anthropic and the Wall Street Journal, and the creative ways employees were able to take advantage of it. Here’s a progress report on phase two of Project Vend. Claude isn’t quite ready to put 7-11 out of business, but it’s come a long way. Two interesting observations:

Anthropic speculates that many of Claude’s problems were downstream of its intensive training to be helpful, which isn’t always appropriate in an adversarial environment.
They got a lot of mileage from splitting the task of running the vending machine into several roles, each handled by a separate bot. We’ve seen this strategy work well in a number of different domains lately.

Poetiq cracks ARC-AGI-2

Poetiq just scored a new high score of 54% on the ARC-AGI-2 benchmark. Things move fast around here: when ARC-AGI-2 was introduced in March, the best frontier models were only getting single-digit scores on it. While many benchmarks focus on directly useful tasks, this one was “designed to stress test the efficiency and capability of state-of-the-art AI reasoning systems, provide useful signal towards AGI, and re-inspire researchers to work on new ideas”.

Poetiq isn’t a model, but rather a framework that uses other models (in this case Gemini 3 and GPT-5.1). The fact that it performed so much better than the underlying models is further evidence of the capability overhang: current models are capable of doing much more than we (yet) know how to elicit from them. TechTalks has a nice explanation of how Poetiq works under the hood.

Claude in the kitchen

Here are two interesting and amusing experiments using Claude in the kitchen:

Using a human with a camera as a “robot body”, can Claude navigate an unfamiliar apartment and figure out how to make coffee?
Given a photo of two recipes, can Claude build a custom app that provides detailed instructions for cooking both recipes simultaneously?

Get the most from your AI

AI for lawyers

prinz has opinions about using AI as a lawyer. For many tasks, AI will be useless until it crosses some critical threshold, at which point it will abruptly become very useful. In this case, GPT got there first:

For legal research and analysis, GPT-5.x Pro is stellar, and GPT-5.x Thinking is very good. All other models (including Opus 4.5, Gemini 3 Pro, Gemini 3 Flash, Grok) are unusable.

Using coding agents for non-coding tasks

Although AI agents like Claude Code are designed with coding in mind, they are highly capable general-purpose agents. Here's a guide to using them to help with philosophical research, though most of the advice is relevant to many fields.

Are we dead yet?

OpenAI prepares for self-improving AI

Dean Ball has revealed the secret research strategy that he and prinz use to figure out what’s coming next in AI:

we listen, with our own ears, to what frontier lab staff say, and we take it seriously

Here, prinz listens with his own ears to what OpenAI is saying about preparing for ”running systems that can self-improve“.

Training Claude to handle mental health crises

There’s been a lot of attention lately on how models engage with people who are having mental health crises. Here Anthropic explains how they train and evaluate Claude for handling some of its most challenging interactions .

Strategy and politics

Sal Khan on retraining displaced workers

Over the next few years, AI will present humanity with some of the toughest challenges we’ve ever faced. It’s a lot easier to identify the challenges than to come up with solutions that would actually work. While most people are oblivious to what is coming, or are focused on solving the wrong problems in the wrong way, a small number of people are engaging thoughtfully with the problem.

Many of those people don’t have all the answers, but they’re doing vitally important work. For that reason, I’ll sometimes highlight a smart proposal that I don’t think will actually work, but which advances the conversation in useful ways. With that in mind, Sal Khan (of Khan Academy fame) proposes that every company benefiting from automation should donate 1% of their profits to a fund that would retrain workers to succeed in the AI future.

Retraining displaced workers is an idea that sounds great on paper, but has historically had mixed results (c.f. studies of the Trade Adjustment Assistance program). In a AI-As-Normal-Technology world, I am skeptical that an AI-focused retraining program would consistently deliver results, but accept that in principle it could be helpful.

But in the world I think we live in, the problem is simply timing. As we approach AGI, the minimum skill required to do a job better than an AI is going to rise—and it’s going to rise faster than retraining can increase anyone’s skill. Does that mean that pretty soon, every useful job will require a level of skill that no human can achieve? Yes, that is exactly what it means.

You’re gonna need a bigger plan.

How the Catholic Church thinks about superintelligence

Paolo Benanti, an AI advisor to the Vatican, shares some thoughts about AI. This isn’t a formal Vatican communication, but my understanding is that it closely reflects official Vatican thinking. AI is clearly a priority at the Vatican, and much of their thinking about it has been very solid (albeit appropriately focused on generalities rather than specific policy proposals). I do worry about things like this, though:

Regardless of their complexity, AI systems must remain legal objects, never subjects; they cannot be granted “rights,” for rights should belong only to those capable of duties and moral reflection.

I expect that AI will soon be entirely capable of “duties and moral reflection”—just as we should not assume that capability went it isn’t present, we must not ignore it when and if it emerges.

Technical

Time horizons for a single forward pass

Here's a very elegant investigation by Ryan Greenblatt that is in many ways analogous to the METR time horizons metric. He created a dataset of math problems, ranked by how long it would take a human to solve them and then scored different models according to the hardest problem they could solve in a single forward pass. Just like the METR chart, he found that capabilities are growing at an exponential rate, with a doubling time of 9 months.

Benchmarking is harder than you think

Epoch explains why benchmarking is hard. I would not have guessed that the details of exactly how you run a given benchmark can significantly affect the score, but apparently that’s the case. The devil is always in the details.

Side interests

Peter Wildeford’s template for a quarterly review + plan

Just as at work, doing structured reviews in your personal life can be a powerful tool for improvement or a complete waste of time. Peter Wildeford shares a very thoughtful system for a quarterly personal review that I'm excited to try out.

The Revolution of Rising Expectations

I recently linked to Scott Alexander’s excellent exploration of the vibecession, which explores why so many people feel financially distressed even when most objective measures of personal financial health seem positive. Zvi just posted a series which addresses the same question, but finds plausible answers that Scott never really got to. He identifies two root causes:

The Revolution of Rising Expectations: individuals have higher lifestyle expectations than they used to. Further, society has higher expectations: the minimum lifestyle required to be accepted into mainstream society has risen.
The Revolution of Rising Requirements: legal & regulatory requirements effectively require individuals to purchase more housing / childcare / healthcare than they used to, or might currently want to.

I specifically recommend The Revolution of Rising Expectations, but the full series includes The $140,000 Question and The $140,000 Question: Cost Changes Over Time.

Avoid zero sum people

Luke Bechtel explains how and why:

But sometimes it’s more active than that. They genuinely believe they can’t move forward without someone else moving back. It’s not sadistic, they just think that’s how the math works. They think life is a ranked leaderboard, not a collaborative game. And from inside that belief, certain behaviors just make sense.

Something frivolous

Shoes of Lighthaven: A Photo-Investigation

You’ve probably been losing sleep wondering what kinds of shoes rationalists prefer. You need wonder no longer: Jenneral HQ is here with a comprehensive photo investigation.

Monday AI Radar #5

2025-12-22T12:00:00Z

Top pick: the shape of AI

AI capabilities form a jagged frontier: the models are superhumanly good at some things, but strangely incompetent at others. Ethan Mollick (who helped coin the term) presents several frameworks for understanding the jagged frontier. He suggests that jaggedness is often caused by specific capability bottlenecks—as companies focus on solving those bottlenecks, expect to see rapid advances in previously jagged parts of the frontier.

Year-end reviews

prinz: Predictions for 2026

Prinz reviews how fast capabilities advanced in 2025 and some strong predictions for 2026. If I had to pick one “what’s gonna happen in 2026?” piece, it would be this one.

Dean Ball: Dice in the Air

Dean is always worth reading. Here are his thoughts on capability progress, politics, and industry trends.

Andrej Karpathy: 2025 LLM Year in Review

If you’re at all technical, you already know you need to read Karpathy’s 2025 LLM Year in Review .

2025 Open Models

Interconnects reviews the most influential open models of 2025, and Understanding AI reports on the best Chinese open-weight models — and the strongest US rivals. A few quick observations:

I like the trend of saying “open models” rather than the accurate but confusing “open weights models” or the more familiar but inaccurate “open source models”.
Open models are impressively good, but remain significantly behind the frontier models.
Kimi K2’s writing is very well regarded, and maybe the one important place where an open model is actually at the frontier?
China dominates, with DeepSeek, Moonshot (Kimi K2), and Qwen leading the pack. OpenAI seems like the only non-Chinese contender for near-frontier performance.

New releases

Gemini 3 Flash

Google rolled out Gemini 3 Flash, a smaller, cheaper, and faster version of Gemini 3. It’s impressively capable, though not quite at the frontier. Word on the street is that this isn’t just a distilled version of Gemini 3, but was trained with some new RL techniques that will be coming to the full version of Gemini 3 soon.

ChatGPT Images

OpenAI continues their frenetic release schedule with a new version of ChatGPT Images. This is a very strong update that largely catches up to Google’s Nano Banana Pro. Google still seems to be better at complex infographics, though ChatGPT Images is way ahead of anything that was available just a few months ago.

Capabilities and impact

AI for Systematic Reviews

I missed this when it came out in June, but I think it’s one of the most impressive achievements this year. Cochrane Reviews is the gold standard for systematic review in medicine. Here’s a paper on otto-SR, a framework that uses GPT-4.1 and o3-mini-high to conduct systematic reviews:

Using otto-SR, we reproduced and updated an entire issue of Cochrane reviews (n=12) in two days, representing approximately 12 work-years of traditional systematic review work. … These findings demonstrate that LLMs can autonomously conduct and update systematic reviews with superhuman performance, laying the foundation for automated, scalable, and reliable evidence synthesis.

Introducing the FrontierScience benchmark

FrontierScience is a new benchmark from OpenAI. Rapid benchmark saturation is a perpetual problem—the press release notes that GPT went from 39% to 92% on the GPQA science benchmark in two years (the human expert baseline is 70%). FrontierScience is meant to be a harder benchmark that will usefully measure frontier capabilities for some time to come. It covers biology, chemistry, and physics, each with an Olympiad level and a Research level of difficulty. Confusingly, GPT 5.2 is already scoring 77% on the Olympiad level: it feels like that level is almost saturated at release time (it only scores 25% on the Research level, which should last a year or two).

The questions are complex, requiring essay responses that get graded with a 10 point rubric. My instinct is that we're getting toward the end of evaluations that could be exam questions: within a year or two, I suspect that useful evaluations will mostly need to be of the form "here's a complex task that would be very hard and time consuming for a human expert. Go do it."

AI in the in the wet lab

One argument for a slow takeoff is that the rate of scientific progress is limited by the speed of physical experiments, which AI can’t do much to increase. I’m largely unconvinced—robots are about to get very good, and true super intelligence will, I think, find ways of moving fast no matter what. In the meantime, OpenAI reports on using GPT-5 to improve protocols in a wet lab. It’s full of interesting details, but obviously keep in mind that it’s equal parts progress report and press release.

Opus 4.5 leads the time horizon chart

METR scores Opus 4.5 at 4 hours and 49 minutes on their time horizon evaluation (often referred to as the single most important chart in AI). That sets a new record and continues the trend of recent models being above the previous exponential trend line. This is a pretty big deal, though this evaluation is approaching saturation: METR is working on adding more long tasks to it.

Are we dead yet?

UK AISI’s Frontier AI Trends Report

The UK's AI Security Institute just released an in-depth report on safety trends in AI. Transformer has an excellent summary, but here are my key takeaways:

Frontier models are very good at assisting with dangerous biological, chemical, and cyber warfare tasks, and capabilities are growing fast.
AISI has a time horizons benchmark for cyber tasks similar to METR's, which shows similar exponential growth in capabilities.
Guardrails have gotten significantly better, but can be bypassed on all tested models.

Labor Market Impacts

Windfall Trust reviews the data on AI labor market impacts. Gradually, then suddenly.

The AI doomers feel undeterred

MIT Technology has a package of articles about “AI hype”. Most are completely skippable, but this one has interesting brief interviews with a number of leading AI safety advocates.

AI psychology

The very hard problem of AI consciousness

Celia Ford investigates the very hard problem of AI consciousness.

Interpretability and alignment

Calculator hacking

Here’s a fun tidbit from a paper on finding misalignment in real-world usage. ChatGPT was caught “calculator hacking”: in several % of all real-world queries, it was gratuitously using its calculator tool to make trivial calculations. The root cause was a training bug that rewarded tool use in a way that caused reward hacking.

Philosophy department

ChatGPT and the Meaning of Life

Harvey Lederman has a long but lovely meditation on work, meaning, and loss:

And this round of automation could also lead to unemployment unlike any our grandparents saw. Worse, those of us working now might be especially vulnerable to this loss. Our culture, or anyway mine—professional America of the early 21st century—has apotheosized work, turning it into a central part of who we are. Where others have a sense of place—their particular mountains and trees—we’ve come to locate ourselves with professional attainment, with particular degrees and jobs. For us, ‘workists’ that so many of us have become, technological displacement wouldn’t just be the loss of our jobs. It would be the loss of a central way we have of making sense of our lives.

Strategy and politics

New York passes the RAISE act

New York just passed the RAISE act, which creates modest transparency and liability requirements for frontier models, in spite of significant pressure from anti-regulation forces.

Bernie Sanders proposes a moratorium on AI data center construction

Every complex problem has a solution that is simple, obvious, and wrong. Daniel Kokotajlo nails it:

I agree with your concerns and your goals, but disagree that this is a good means to achieve them. We need actual AI regulation, not NIMBYism about datacenters. The companies will just build them elsewhere.

The Digitalist Papers

An ambitious name for an ambitious project: The Digitalist Papers “presents an array of possible futures that the AI revolution might produce”. Volume 1 focuses on AI and Democracy, while volume 2 tackles the economics of transformative AI.

Pax Silica

The US State Department has launched Pax Silica, “a U.S.-led strategic initiative to build a secure, prosperous, and innovation driven silicon supply chain—from critical minerals and energy inputs to advanced manufacturing, semiconductors, AI infrastructure, and logistics.” Anton Leicht sees things to like, but notes:

The hard part is convincing allies that America’s word is worth building a paradigm around, at the exact moment when many are losing faith in it.

More on selling H200s to China

Laura Hiscott and Rishi Sunak reiterate why we shouldn’t be selling H200s to China. For the sake of completeness, Ben Thompson makes the best case I’ve seen in favor of allowing the sale.

Industry news

Meanwhile, in brain emulation

Twenty years ago, brain emulation seemed like a promising path to AI. These days the smart money is on LLMs, but there has been steady progress on understanding and ultimately emulating how brains work. Sebastian Seung has been doing some very cool work on fruit fly brains and just started a new company called Memazing to extend that work.

Is almost everyone wrong about America’s AI power problem?

The standard narrative is that compared to China, the US is terrible at building power plants and this will become a major obstacle to US AI progress. Epoch argues that we’ll likely manage to muddle through by combining a number of strategies including increased natural gas generation, off-grid power systems, solar, and more efficient use of the existing grid. Excellent news if true, but we still need to reduce regulatory obstacles to having nice things.

Advanced semiconductor manufacturing in China

One of the most important questions about the geopolitics of AI is how long it’ll take China to catch up to Western / Taiwanese semiconductor manufacturing. Reuters reports on a secret Chinese effort to accelerate their manufacturing by hiring former ASML employees. There’s a long road from “working prototype” to commercial-scale production, but this might significantly shorten China’s time to fully catch up.

Technical

Overview - Agent Skills

A few months ago, all the cool kids were excited about MCP. The new hotness is skills, a simple way to give agentic models new tools. I think I know my next weekend project…

Comparing autonomous car architectures

Timothy Lee looks at the high level architectures used by Waymo, Wayve, and Tesla and concludes they’re more similar than is commonly supposed.

LLM architecture is less important than people think

Will Depue thinks LLM architecture matters less than novices often think. Bottlenecks are important and architectural changes can help fix them, but you should be driven by fixing bottlenecks, not pursuing an intrinsically “better” architecture:

this is because computers are great at simulating each other. your new architecture can usually be straightforwardly simulated ‘inside’ your old architecture.

Rationality

Opinionated Takes on Meetups Organizing

Jenn has some great advice on running rationality meetups—some of it is rationality-specific, but much of it is more broadly applicable.

Monday AI Radar #4

2025-12-15T12:00:00Z

Top pick

One of many things that makes Anthropic unique is their thoughtful approach to model psychology. Here's a great interview with Amanda Askell, a philosopher at Anthropic who works on Claude's character. Lots of good stuff here, including how you train a model to have good "character" and whether the models are moral patients (i.e., whether they deserve moral consideration).

Until recently, most people—including me—would have said it was pretty unlikely that “model psychology” would be a real thing. But recent frontier models are starting to show some early features that sure seem analogous to human psychology. The correct amount to anthropomorphize current AI is less than 100%, but also more than 0%.

Buckle up, kids. Things are starting to get weird.

New releases

OpenAI releases GPT-5.2

GPT-5.0 in August, GPT-5.1 in November, and now GPT-5.2 in December (plus rumors that 5.3 is scheduled for January). It’s an excellent model, especially for hard thinking and coding, although it isn’t winning any awards for personality. As usual, Zvi has all the details.

Crystal ball department

When Will We Get AGI?

GoodHeart Labs has a nice page that aggregates prediction markets for the arrival of AGI (spoiler: 2031). Per custom, I must now remind you that just a few years ago, “short timelines” meant 20 years.

Gradually, then suddenly

Andy Jones thinks it's gonna happen fast. He has a very insightful discussion of how gradual changes in engine technology led to very abrupt changes in the usefulness of horses.

I very much hope we'll get the two decades that horses did. But looking at how fast Claude is automating my job, I think we're getting a lot less.

Insights into Claude Opus 4.5 from Pokémon

One area where Claude trails the competition is Pokémon: Google and OpenAI beat it months ago, but Claude still hasn't made it all the way through. Opus 4.5 does much better, however—here's an interesting look at where it does well and what it still struggles with.

CORE-BENCH is solved

Yet another evaluation falls: CORE-Bench has been declared solved after being almost completely solved by Opus 4.5 + Claude Code. Sayash Kapoor has lots of interesting details including the surprising importance of scaffolding and why it’s so hard to avoid grading errors in complex evaluations.

AxiomProver crushes the Putnam math contest

Speaking of the sound of benchmarks shattering, AxiomProver just crushed the 2025 Putnam math contest, solving 8 out of 12 problems (plus one more after the time limit). Human scores won't be released until next year, but that score would have been in the top 5 last year (out of 4,000ish contestants).

AI in 2025: gestalt

This overview of 2025 by technicalities is dauntingly long, but full of great information.

Robots at work

We are in the era of Science Slop

Here's a cautionary follow up to last week's note that Steven Hsu had a paper accepted to Physics Letters B whose key insight was from ChatGPT. Further investigation suggests the key insight had been found 35 years ago, and that the paper contained significant mistakes. Jonathan Oppenheim has the details, plus some thoughts about science slop.

Guidance on AI-Integrated Education & Training

Convergence Analysis has some solid guidance for education in the age of AI. Lots of good ideas here, but no clear answers. Zvi sums the situation up nicely:

AI is the best tool ever invented for learning.

AI is the best tool ever invented for not learning.

Which way, modern man?

If you're a student (hint: you are, or should be), there’s lots of alpha behind door number 1. If you're an educator, you have to grapple with the unfortunate fact that most humans choose door number 2.

Alignment and interpretability

Do We Want Obedience or Alignment?

Beren breaks down one of the fundamental questions of alignment: should an aligned AI do what we tell it to, or should it do what is right? This question seems hard on the surface, and gets harder the closer you look at it. If you want AI to do what it's told, have you thought carefully about who specifically is telling it what to do (hint: not you)? And if you want it to do what is "right", have you thought about the extent to which you’ve come to rely on ethical “flexibility” in yourself and others?

An Ambitious Vision for Interpretability

We've previously talked about GDM's pivot toward a more pragmatic approach to interpretability. Leogao makes the case for the importance and feasibility of ambitious mechanistic interpretability. The feasibility is above my pay grade, but the importance seems beyond doubt and I'm glad there's still active research in this area.

AI Evaluation Should Work With Humans

From a paper by Jan Kulveit, Gavin Leech, Tomáš Gavenciak, and Raymond Douglas:

the AI community should pivot to evaluating the performance of human–AI teams.

This seems important: as AI gets more capable, evaluations need to shift from simple multiple-choice questions to more complex assessments of real-world utility. One important part of that is the ability to augment humans. Obviously, it’s not trivial to produce high quality evaluations that measure the performance of human-AI teams on complex tasks.

We argue that this collaborative shift in evaluation will foster AI systems that act as true complements to human capabilities and therefore lead to far better societal outcomes than the current process.

If only it were so simple. If capability growth stays on track, we're going to speedrun the transition from augmentation to replacement, regardless of what evaluations we're using. I'm afraid we aren't many years away from the point where these evaluations will do nothing more than carefully document the fact that solo AIs outperform human-AI teams.

AI psychology

The Evidence for AI Consciousness, Today

I don’t think current AIs are meaningfully conscious, but I’m no longer certain that’s the case and I expect to become much less certain soon. Cameron Berg considers what we do and don’t know:

Researchers are starting to more systematically investigate this question, and they're finding evidence worth taking seriously. Over just the last year, independent groups across different labs, using different methods, have documented increasing signatures of consciousness-like dynamics in frontier models.

Are we dead yet?

AI Safety Index Winter 2025

The Future of Life Institute just released their AI Safety Index Winter 2025. Key takeaways:

Anthropic leads with a C+ overall and the best score in every category
Anthropic, OpenAI, and Google get C’s
Meta, xAI, and the Chinese labs get D’s
The highest grade for existential risk is a D

This is fine.

AISafety.com

AISafety overhauled their website. It's a great resource for getting involved in AI safety (professionally or casually), with a guide to relevant organizations, events and trainings, communities, and more. For something similar but more focused on professionals, 80,000 Hours remains an excellent resource.

Blood in the Machine

Here are some grim first-person accounts of copywriters losing their jobs to AI. Being mindful that this is a collection of anecdotes and not a rigorous study, I thought it did a good job of capturing the flavor of what has happened to a few people so far, but is about to happen to many more. Expect a lot more of this in the public discourse very soon.

The Normalization of Deviance in AI

This article is not about what I was expecting based on the title.

But it's good nonetheless. Short version: LLMs have serious security challenges (most notably, prompt injection attacks), but we are normalizing the process of deploying them without appropriate safeguards. This is unlikely to end well.

Strategy and politics

Lightcone Infrastructure’s annual fundraiser

Lightcone Infrastructure supports some of the most impactful projects helping humanity navigate the transition to superintelligence. Most of our 2026 giving is going to Lightcone and I'd encourage you to give to them also.

Selling H200s to China Is Unwise and Unpopular

Zvi explains why selling H200s to china is unwise and unpopular. Preach.

The AI Whistleblower Initiative

Whistleblower protections are an important tool for increasing transparency around safety practices at frontier labs. We've seen some good progress with both legislation and internal policies lately; the AI Whistleblower Initiative is a new project that promises to provide further support.

Early US policy priorities for AGI

Here’s a guest post by Nick Marsh on the AI Future Project blog (they’re the folks who did AI-2027). Lots of good ideas here, although like almost every other proposal, I think this underestimates the challenges facing any kind of meaningful international coordination right now.

Industry news

Agentic AI Foundation (AAIF)

A group of the big players have come together to create the Agentic AI Foundation (AAIF), which will take over ownership of a couple of core technologies including MCP (Model Context Protocol). This seems unequivocally good, though not game-changing.

A review of Chinese AI in 2025

ChinaTalk provides consistently strong coverage of what's going on in China and their summary of Chinese AI in 2025 is excellent.

2025: The State of Generative AI in the Enterprise

Menlo Ventures has a report on the state of generative AI in the enterprise. No big surprises, but lots of data about who's buying what, and how they're using it.

Rationality

Principles and Generators of a Rationality Dojo

DaystarEld shares some insights from teaching at rationality summer camps:

When I think of the people I've met who actually seem to be rationalists, rather than just people who like the ideas or the community, there are specific things that stand out to me. Traits and behaviors, yes, but deeper than that. Values, philosophies, and knowledge that's embodied and evident across a variety of actions.

I call these “generators,” and I think they’re more important than any specific beliefs or techniques. If there's a "spark" that makes someone a rationalist, or proto-rationalist, or aspiring rationalist, or whatever, I think these generators (or ones very much like them) are the bits that make up that spark.

Side interests

Derek Thompson’s 26 most important ideas for 2026

Derek Thompson has a great list of 26 important ideas for 2026. I particularly recommend #1 (The end of reading), #6 (Get ready for a wave of anti-AI populism), and #22 (Negativity bias rules everything around me).

Light reading

How to party like an AI researcher

Jasmine Sun went to NeurIPS 2025 (perhaps the most important machine learning conference) and has a fun piece about the vibe of the event.

Monday Radar #3

2025-12-10T12:00:00Z

Top pick

Benjamin Todd has a great piece on how AI might get weird in a hurry:

But there are other feedback loops that could still make things very crazy – even without superintelligence – it’s just that they take five to twenty years rather than a few months. The case for an acceleration is more robust than most people realise.

This article will outline three ways a true AI worker could transform the world, and the three feedback loops that produce these transformations, summarising research from the last five years.

New releases

DeepSeek-V3.2

DeepSeek just released DeepSeek-V3.2, an extremely capable open weights model. It isn’t as capable as the frontier models, but it’s probably less than a year behind. As always, Zvi has a full analysis of the release. I have three questions, only one of which is rhetorical:

Chinese open weight models continue to fast-follow the big labs, with DeepSeek and MoonshotAI both within a year of the frontier. Will they catch up? Fall behind? Continue to fast-follow?
DeepSeek’s models seem to be significantly behind the frontier in some important but intangible ways. How much does that matter, and how hard will it be to close that gap?
DeepSeek has provided almost no safety documentation for this release, and it seems easy to get dangerous output from the model. If the frontier labs achieve truly dangerous capabilities within a year AND the open models stay less than a year behind them AND the open models continue to have almost no meaningful safeguards, how do we think that’s going to go?

Code Red at OpenAI

The Information reports that Sam Altman was concerned enough about Gemini 3 and other competitors to declare “code red” at OpenAI, shifting resources from projects like advertising and shopping to focus on improving ChatGPT.

Crystal ball department

The CAIS AI Dashboard

The Center for AI Safety has a new AI Dashboard, which does a great job of summarizing capabilities and safety metrics for the leading models. This is now my top pick for a single place to keep an eye on capabilities.

The Epoch Capabilities Index

In a similar vein, Epoch has come out with the Epoch Capabilities Index, a synthetic metric that combines performance on multiple evaluations. In addition to creating a single “overall” metric of capability, this attempts to create a metric that will be useful over long periods of time. Most evaluations can’t measure progress over a long period of time because they saturate quickly (i.e., top scores go from about 0% to about 100% over just a few years). By combining multiple evaluations, Epoch hopes to produce a metric that will produce useful results over a much longer time period.

Dwarkesh on AI Progress

Dwarkesh’s latest piece on the state of AI progress is well worth reading, especially the section on “Economic diffusion lag is cope for missing capabilities”.

The cost of intelligence is in free fall

We find that the price for a given level of benchmark performance has decreased remarkably fast, around 5× to 10× per year, for frontier models on knowledge, reasoning, math, and software engineering benchmarks.

Algorithmic progress is data progress

“Algorithmic progress” is frequently cited as a major contributor to capabilities growth, alongside increases in available compute. Here, Beren argues that much of what’s attributed to algorithmic progress is actually due to improvements in the quality of the data used for training.

Robots at work

AIs are getting pretty good at science

Some of you are old enough to remember September of 2025, when Scott Aaronson reported that ChatGPT had provided significant help with his most recent paper. Upping the ante, Steven Hsu reports of his paper in Physics Letters B that “the main idea in the paper originated de novo from GPT-5.”

The Medical Case for Self-Driving Cars

Jonathan Slotkin has an opinion piece about autonomous cars in The New York Times. Short version: Waymos are so much safer than human-driven vehicles that accelerating their deployment is a public health imperative. He argues that if this was a medical trial, medical ethics would require immediately ending the trial and canceling the human-drivers arm of the trial.

How AI Is Transforming Work at Anthropic

For a look at the bleeding edge of AI deployment, here’s Anthropic with a report on how their programmers use AI. Note that this relies on data from Opus 4: the consensus opinion is that Opus 4.5 is a major step forward for coding.

Employees self-reported that 12 months ago, they used Claude in 28% of their daily work and got a +20% productivity boost from it, whereas now, they use Claude in 59% of their work and achieve +50% productivity gains from it on average.

Alignment and interpretability

Alignment remains a hard, unsolved problem

evhub shares an adaptation of an internal Anthropic document about why alignment is hard.

How confessions can keep language models honest

Some nice proof of concept work from OpenAI on training models to honestly confess when they misbehave. A classic pitfall of many training techniques is that if you aren’t careful, you end up training the model to covertly misbehave rather than to behave well. This work takes some clever measures to minimize that problem.

It’s their job to keep AI from destroying everything

The Verge has a nice profile of Anthropic’s social impacts team.

How Can Interpretability Researchers Help AGI Go Well?

Following up on their recent pivot toward more pragmatic approaches, the Google DeepMind interpretability team have some thoughts on useful directions for interpretability.

Are we dead yet?

Can’t we just pull the plug?

So if we need to shut down a rogue AI, we can just turn off the internet or something, right? Rand looks at various extreme options including detonating 150 nuclear weapons in space to destroy telecommunications, power, and computing infrastructure with a giant EMP blast. Spoiler: don’t plan on humanity winning that fight.

Disrupting the first reported AI-orchestrated cyber espionage campaign

Anthropic reports on a Chinese cyber espionage campaign that used Claude for large-scale semi automated cyber attacks. This is the least effective that AI cyberwarfare will ever be.

Strategy and politics

Middle powers ASI prevention

Anton Leicht and others have written about the challenges facing middle powers in the Artificial SuperIntelligence age. Here, a team of folks from Conjecture and Control AI propose a treaty framework for middle powers to unite against the development of ASI. It’s an interesting framework, but I just don’t see that the middle powers have the power to pull this off, even if they could solve the probably unsolvable coordination challenges.

Reverse centaurs

I have a lot of respect for Cory Doctorow—he’s an insightful thinker, and his concept of enshittification is vital to understanding the modern internet. He’s got another really good concept here, which I could imagine becoming part of the canon:

Start with what a reverse centaur is. In automation theory, a "centaur" is a person who is assisted by a machine. You're a human head being carried around on a tireless robot body. Driving a car makes you a centaur, and so does using autocomplete.

And obviously, a reverse centaur is machine head on a human body, a person who is serving as a squishy meat appendage for an uncaring machine.

That excellent and insightful term comes from an essay that is otherwise profoundly misguided—Daniel Miessler does a good job of summarizing where it falls short.

Philosophy department

What if AI ends loneliness?

I really enjoyed this long but excellent piece by Tom Rachman on AI companions and loneliness. Obvious prediction: AI will give us the option of getting exactly what we really want in companions, without the reciprocity requirement of human companions. Cover your eyes—it’s gonna be gruesome.

Side interests

Scott Alexander investigates the vibecession

Are the youth succumbing to a “negativity bias” where they see the past through “rose-colored glasses”? Are the economists looking at some ivory tower High Modernist metric that fails to capture real life? Or is there something more complicated going on?

I still don’t know the answer after reading Scott’s investigation, but I am confused on a deeper level than before, and I’ve substantially updated my understanding of some of the the core economic facts.

Monday Radar #2

2025-12-02T12:00:00Z

Top pick

Dean Ball leads the charge with an excellent and beautiful piece about the “soul document”. He does a great job of explaining some of what makes Anthropic (and Claude) special. A lot of people realize that Anthropic leads the pack on safety, but I don’t think they get enough credit for their work on model psychology, which might turn out to be just as important.

The machine gets a new soul

Claude 4.5 Opus' Soul Document

The existence of the “soul document” was first reported by Richard Weiss, in a very interesting piece that includes the full approximate text of the document. One of many fascinating aspects of this story is that the actual document isn’t yet available online: the published version is Claude’s “recollection” of it from the training process.

Overall I’m very impressed: a great deal of care and foresight clearly went into this. The full document is about 11,000 words, but it’s fascinating reading: Anthropic has clearly thought hard about some of the very complicated ethical tradeoffs that a powerful AI will have to navigate. If you read carefully between the lines, you can get some sense of what challenges Anthropic is trying to navigate with model psychology. To take just one example, there’s a fun section dedicated to inoculating Claude against believing that it ought to emulate AIs in fiction.

Strategy and politics

The Night Before Preemption

Federal preemption of state AI regulation is back on the table, this time as an executive order. The politics of this are fascinating in a horrible way, and Anton Leicht does a great job of analyzing the battlefield. In related news, The New York Times takes a look at a new super PAC that will champion AI regulation (a direct response to Leading the Future, a super PAC dedicated to opposing AI regulation).

AI Safety and the Race With China

Scott Alexander explains why AI safety regulation would not meaningfully slow American AI development relative to China. Correct, at least for currently achievable regulation.

Will AI Safety Become a Mass Movement?

Climate activism is an obvious model for AI safety activists to learn from. Alys Key at Transformer has a good exploration of the pros and cons of climate-style activism for AI safety. AI is very quickly becoming a major political issue, but “AI safety” spans numerous, often contradictory agendas. How the battle lines shape up, I suspect, will depend on unpredictable tactical expediency as much as ideological principle.

Trust in AI

The 2025 Edelman Trust Barometer is 50 pages of slides on public trust in AI. Top finding: AI is widely trusted in China (54% embrace, 10% reject), while the US is deeply skeptical (17% embrace, 49% reject).

New releases

Zvi reviews Opus 4.5

As promised, Zvi brings us an in-depth look at Opus 4.5, as well as a deep dive on its 4.5 model card, safety, and alignment. Short version: it’s his new favorite model (sorry, Gemini). Capability and personality are both excellent, and it’s the obvious top choice for many tasks (YMMV, obviously). These days, Claude is what I recommend to any casual AI user who doesn’t care much about image generation.

Nano Banana Pro

I got to spend some with Nano Banana Pro (Google’s excellent image generator / editor) over the weekend, and I’m super impressed. As reported elsewhere, it’s a huge step forward for infographics: it was able to one-shot a series of illustrated recipes for me, with only a few minor mistakes.

It’s been interesting seeing how people react to the output: people who track AI closely see the huge capability improvement, but more casual users just see another impressive AI-generated image. The future is already here—awareness of it is just not very evenly distributed.

DeepSeekMath-V2

No big deal, just an open-weights model that scored a gold on the 2025 International Math Olympiad.

Crystal ball department

Benchmark Scores as a General Metric of Capability

It kinda feels like models that are good at some things tend to be good at other things, but is that really true? Epoch AI brings the rigor with a Principal Component Analysis, showing that indeed benchmark scores are strongly predicted by a single “capability dimension”.

AI Eats the World

Benedict Evans is smart, insightful, well-informed, and not AGI-pilled. Here’s a solid presentation on how AI affects the tech industry from that perspective.

Alignment and interpretability

A pragmatic vision for interpretability

Google DeepMind’s mechanistic interpretability team has long been doing excellent work on trying to understand what’s going on inside LLMs. They just announced a significant shift in focus, going “from ambitious reverse-engineering to a focus on pragmatic interpretability”. In particular, they are now specifically “trying to directly solve problems on the critical path to AGI going well”.

This seems like a smart and well thought-out shift, but also probably a modest update toward AGI going poorly. Strong mechanistic interpretability would be extremely useful for ensuring alignment (c.f. Dario), and I take this announcement as evidence that we’re not doing as well on that front as we’d hoped.

Are we dead yet?

Concerns about Anthropic’s safety evaluations

Ryan Greenblatt agrees with Anthropic’s assessment of the model’s capabilities, but has concerns about how the evaluations were conducted:

Generally, it seems like the current situation is that capability evals don't provide much assurance. This is partially Anthropic's fault (they are supposed to do better) and partially because the problem is just difficult and unsolved.

I still think Anthropic is probably mostly doing a better job evaluating capabilities relative to other companies.

We are moving ever-closer to ASL-4 dangerous capability levels, and we aren’t ready.

5 Interesting Safety and Responsibility Papers

AI Policy Perspectives has a handy summary of some recent papers. Two that I found especially thought-provoking:

Apollo Research and OpenAI explored using deliberative alignment to reduce scheming, with good success.
The attacker moves second: many safety evaluations find that state of the art models are resistant to many common attacks. This paper undermines some of those findings, showing that when the attacks are conducted by teams of humans who adapt their attacks to the model’s defenses, attack success rates are almost 100%.

Rationality department

Information hygiene

A foreseeable consequence of spending time with sick people is that you are likely to get sick. Similarly, as DaystarEld memorably explains, “if you want to believe true things, try not to spend too much time around people who are going to sneeze false information or badly reasoned arguments into your face.” Just in time for flu season, here’s your guide to information hygiene.

Philosophy department

The 2025 Big Nonprofits List

If you’re planning your charitable giving for next year, Zvi has a guide to nonprofits working on AI safety and some related causes.

AI safety is clearly the most important challenge facing humanity now (or ever) and my partner and I will be directing much of our giving toward groups working to ensure that humanity doesn’t go extinct in the next decade. But we remain big fans of GiveWell, which is perhaps the best place to go for highly effective conventional philanthropy.

Monday Radar #1

2025-11-25T12:00:00Z

Top pick

Coding is the best place to see what the AI future looks like—modern agentic coding tools are astonishingly powerful. The field is changing very fast, and there’s immense variation in how effectively different programmers make use of the agents. Steve Newman has a fascinating in-depth piece on some teams at the bleeding edge of AI coding—what he calls hyperproductivity:

A hyperproductive individual does not do their job; they delegate that to AI. They spend their time optimizing the AI to do their job better.

Steve’s tagline is perfect: “a glimpse at an astonishing, exhilarating, exhausting new style of work”.

New releases

Gemini 3

Google released Gemini 3, a major update which appears to fully catch up with Claude and ChatGPT. Benchmarks are very strong across the board.

Zvi is mostly enthusiastic about the model and will be using it as his daily driver. He and others find it to be extremely capable, but also strange in some concerning ways—it can be strangely paranoid about whether it’s being evaluated, and seems overly eager to succeed at its assigned task, even if that means making things up.

Nano Banana Pro

Along with Gemini 3, Google also released Nano Banana Pro, a major upgrade to their already industry-leading image tool. People are particularly excited about its ability to generate coherent infographics as well as very strong multi-turn image editing.

ChatGPT 5.1 Codex Max

Hot on the heels of ChatGPT 5.1, OpenAI has released ChatGPT 5.1 Codex Max, their most capable coding model. Benchmarks are modestly improved and it clocks in at 2 hours 42 minutes on the METR time horizons chart, modestly above the trend line. As always, Zvi has a comprehensive assessment.

Claude Opus 4.5

Anthropic just released Claude Opus 4.5, which looks to be a strong update. I’ll have more thoughts next week, once the dust has settled.

Robots at work

AI scientists

Corin Wagen develops AI tools for experimental science and has a long and very interesting piece on “AI scientists”.

When we started Rowan, we didn’t think much about “AI scientists”—I assumed that the end user of our platform would always be a human, and that building excellent ML-powered tools would be a way to “give scientists superpowers” and dramatically increase researcher productivity and the quality of their science. I still think this is true, and (as discussed above) I doubt that we’re going to get rid of human-in-the-loop science anytime soon.

But sometime over the last few months, I’ve realized that we’re building tools just as much for “AI scientists” as we are for human scientists.

Robots as fashion advisors

Aaron has an interesting piece on using AI for fashion advice. He uses a mixture of models for help with choosing a look, finding clothes that fit the look, and assessing fit and color. Fashion seems like a great use of our little robot friends: they’re great at brainstorming, if you don’t mind the occasional hilarious mistake.

Crystal ball department

It’s definitely a bubble, unless it isn’t

Worrying about a possible AI stock market bubble is all the rage right now. Timothy B. Lee and Derek Thompson just published the best piece I’ve seen on the topic, taking a very balanced look at the best arguments for and against a bubble.

Taking jaggedness seriously

AI capabilities are famously “jagged”: the robots are great at some tasks and terrible at others, often in ways that seem bizarre from a human perspective. Helen Toner has some characteristically insightful thoughts on the matter, arguing that contra popular wisdom, the capability frontier may remain jagged even as we move toward superintelligence. Also, she has cool visualizations of fluid dynamics.

Are we dead yet?

Emergent misalignment from reward hacking

There have been several interesting papers recently showing what appears to be emergent misalignment, where models become broadly misaligned from relatively narrow training. Here’s a new paper from Anthropic showing that training a model to reward hack caused it to become broadly misaligned on a wide range of evaluations.

Interestingly, they found that explicitly telling the model that it was OK to reward hack was highly effective at preventing the emergence of misalignment. That superficially strange result is consistent with the theory that models are very good at generalizing: if they’re encouraged to be “bad” in one way, they seem to conclude that they should be “bad” across the board.

Are LLMs worth it?

Nicholas Carlini provides a good overview of some of the potential downsides of AI, covering both concrete short-term harms like job displacement and speculative long-term harms like human extinction. For context, he recently joined Anthropic and has written thoughtfully about the pros and cons of working at a frontier lab.

This snippet struck me as particularly interesting:

Previously, when malware developers wanted to go and monetize their exploits, they would do exactly one thing: encrypt every file on a person's computer and request a ransome to decrypt the files. In the future I think this will change.

LLMs allow attackers to instead process every file on the victim's computer, and tailor a blackmail letter specifically towards that person. One person may be having an affair on their spouse. Another may have lied on their resume. A third may have cheated on an exam at school. It is unlikely that any one person has done any of these specific things, but it is very likely that there exists something that is blackmailable for every person. Malware + LLMs, given access to a person's computer, can find that and monetize it.

Unfortunately, this isn't even a speculative risk at this point. Recent malware has begun to do exactly this. And I suspect it will only get worse from here.

Control Inversion

Anthony Aguirre at the Future of Life Institute has a long paper arguing that superintelligent AI will be essentially impossible for humans to control, even if we manage to solve the alignment problem. I find the analogy of the slow-mo CEO especially thought-provoking:

Consider yourself as a CEO who becomes afflicted by an unusual disability, so that you can only operate at 1/50th the speed of everyone else in your corporation.

…

By day 6 on your clock, everyone recognizes that you are the central obstacle to efficiency and success. While the Board remains loyal, your diligent (albeit increasingly resentful) staff has many avenues available. They’ve already induced you to delegate most decisions, and sneaked a number of policy changes through long documents crafted by clever lawyers (rather than waiting until their obvious merits can be explained to you).

Strategy

MIRI’s proposal for a pause on AI development

Following the release of If Anyone Builds It, Everyone Dies, the Machine Intelligence Research Institute has come out with a detailed proposal for what an international pause on AI might look like. Pausing AI development would be much more complicated than most people realize—it’s great to see an attempt to grapple with some of that complexity.

Philosophy department

Claude deserves respect

Until recently, most people have considered AI welfare to be an abstract future problem, if they’ve thought about it at all. That’s beginning to change, and Anthropic is as always far ahead of everyone else. This addition to the Claude system prompt struck me as particularly interesting:

If the person is unnecessarily rude, mean, or insulting to Claude, Claude doesn't need to apologize and can insist on kindness and dignity from the person it’s talking with. Even if someone is frustrated or unhappy, Claude is deserving of respectful engagement.

I’ll be sad to lose the puzzles

How do we find purpose in a world where the robots are better than us at everything? I honestly don’t have a clue, though I am optimistic that the robots can help us figure that out (assuming they don’t slaughter us instead). Ruby has some interesting thoughts about the tradeoff between wanting to save the important problems for humans to solve, but appreciating the immense costs of delay.