Against Moloch - Inkhaven

Agricultural Bioweapons, Part One

2026-04-14T12:00:00Z

Although I’ve long worried about AI biorisk, I’ve come to realize that I was underestimating the scope of the problem. Like most people, I’d equated biorisk with bioterrorism: the potential for AI to empower a nihilistic group or individual to unleash a doomsday plague. I still worry about that, but a recent article by Abishaike Mahajan made me realize that we we also need to worry about AI-enabled agricultural bioweapons.

The agricultural bioterrorism section of that article does a great job of explaining why agricultural bioweapons (hereafter agro-weapons) are technically feasible and economically destructive. Today I want to argue that in addition to being feasible, agro-weapons might be attractive to a range of rational bad actors in a way that a doomsday plague would not be.

This is Part One of a two part series. In Part One, I’ll discuss the factors that make agro-weapons easier to create and target than human pathogens. And in Part Two, I’ll explore scenarios in which a range of bad actors might find agro-weapons useful and consider how effective they might be in the real world.

Author’s note: Claude really doesn’t like this article—this was the first time I ran into the forbidden topics classifier. It refused to help with research, editing, or even making a post image. I had to write my own image prompt like some kind of medieval peasant.

What are agro-weapons?

An agro-weapon is a bioweapon that targets crops or livestock rather than humans. It can be as simple as a World War II cattle cake impregnated with anthrax, or as complex as a novel bioengineered pathogen. From an AI risk perspective, we’re concerned with the ability of AI to bring sophisticated bio-engineered agro-weapons within reach of bad actors who wouldn’t otherwise be able to create them.

Agro-weapons are easier to work with

Human pathogens are notoriously hard to work with: any pathogen dangerous enough to be useful as a weapon is by definition dangerous to the people working on it. A human bioweapon program would require at least a Biosafety Level 3 facility, which is complex and expensive to build and operate. Accidents happen even at well-resourced official facilities—the risk is much higher for clandestine facilities with limited resources and technical expertise.

Agro-weapons, on the other hand, are much safer to work on. Wheat rust doesn’t infect humans, so you can work on it without needing a high security biolab. You don’t even need to wash your hands before going to lunch (not really: you should always wash your hands before lunch). The ease of working with them makes agro-weapons accessible to individuals and groups that lack the capacity to work with human pathogens.

Agro-weapons can be targeted

A second advantage of agro-weapons is that they can be targeted in a way that human pathogens cannot. People speculate about bioengineered viruses that target specific ethnic groups, but we are mercifully a long way from knowing how to do that.

Agro-weapons, on the other hand, can targeted at a specific region or agricultural sector. Many harmful fungi already target specific species or even specific cultivars—with bio-engineering, they could likely be made even more precise. That specificity greatly increases their usefulness for warfare, terrorism, or mundane crime.

Monday AI Radar #21

2026-04-13T12:00:00Z

Top pick

How scary is Claude Mythos?

Rob Wiblin’s analysis of Mythos covers all the key points. If you only read this piece, you won’t miss anything vital.

Mythos Preview is another milestone on the race to AGI, arguably as significant as the November 2025 release of Opus 4.5 that kicked off the agentic coding craze. Rob covers both sides of this story: Mythos is the first model powerful enough to cause a major crisis if misused, and (as far as we can tell) also better aligned than any previous Anthropic model.

I expect strong disagreement about how those two factors balance out. Some people will see Mythos as evidence that we are rushing toward AGI without having solved alignment, and others will argue that alignment is progressing as fast as capabilities and we’ll probably manage to muddle through. I believe those aren’t mutually exclusive: we are rushing toward AGI with an alignment strategy that is probably good enough to muddle through with, but which has a real chance of getting us all killed.

Mythos is evidence for short timelines, bringing a big step forward for capabilities that is at least consistent with past trendlines and might represent an inflection point toward even faster progress.

My writing

Quick thoughts about Mythos

A few quick thoughts about the release of Claude Mythos Preview.

Foundational beliefs

Six foundational beliefs that shape how I think about AI safety strategy.

Writing with robots

AI can’t write well, but it’s a great editor—here’s how I use it.

Mythos

All of the following pieces are good, but most of you can just read the summaries and pick and choose which links to follow.

Mythos Preview’s cybersecurity capabilities

Mythos is better at finding and exploiting vulnerabilities than any past model:

Anthropic’s analysis is spot on:

There’s no denying that this is going to be a difficult time. While we hope that some of the suggestions above will be helpful in navigating this transition, we believe the capabilities that future language models bring will ultimately require a much broader, ground-up reimagining of computer security as a field.

As part of that reimagining, Anthropic is giving key companies a head start in the cybersecurity arms race via Project Glasswing. This seems like the best path forward, which doesn’t mean it’s guaranteed to succeed.

Ryan Greenblatt estimates the impact of Mythos

An uncontrolled release could have been ugly:

If Mythos was released as an open weight model in February (or tomorrow), this would cause ~100s of billions in damages, with a substantial chance of ~$1 trillion in damages

The Zvi report

Zvi does a two-part deep dive, covering the system card and the cybersecurity implications. Excellent, comprehensive, long.

New sages unrivalled

Dean Ball argues that Mythos marks a new era for AI. I agree, but I don’t have to like it.

I wrote on X that Mythos means the training wheels are coming off on AI policy. Perhaps the Department of War’s effort to strangle Anthropic is, to use another metaphor, a sign that the gloves are off too. If the last month has made anything clear, it is that we are in a nastier, sharper, harsher, meaner era of AI discourse, policy, and—ultimately—of AI development and use.

Failing to understand and plan for this new era might be the biggest unforced error the AI safety community will make over the next couple of years. Much more than previously, many key players will be motivated by ruthless self-interest rather than an altruistic desire to do what is best for humanity. We need to accept that fact and plan accordingly.

Benchmarks and Forecasts

Ryan Greenblatt’s model of AI progress

Ryan Greenblatt has two long posts on the present state of AI and likely AI timelines. Highly recommended for a deep, gears-level model of how AI capabilities are likely to progress, and especially what the trajectory of AI R&D might look like. The headline result is that based on recent progress, Ryan (like many other people) is shortening his timeline to highly capable AI.

A core part of his thesis is that AI is now immensely capable at coding tasks that are easy to verify. He argues that the human-equivalent time horizon for those tasks is now somewhere between months and years, which represents a superexponential rate of progress. That sounds right—the open question is how quickly we make progress on verifying more complex tasks.

In light of Mythos, he estimates that AI is making Anthropic engineers 1.75x faster, but the overall speedup of Anthropic’s AI R&D is only 1.2x. It’s too early to tell whether that’s the early stage of an intelligence explosion, or an indication that other factors will bottleneck progress and prevent runaway acceleration.

Musings on recursive self-improvement

Seb Krier is skeptical that recursive self-improvement will go as fast as some people think:

When people talk about recursive self-improvement, they sometimes acknowledge these frictions but then treat them as secondary, or assume that sufficiently capable systems can route around most of them via internal deployments and accelerated R&D. I think this is often overstated: these bottlenecks do not disappear just because model development speeds up. They are structural, not incidental, and they push strongly against the more explosive versions of the RSI story.

It’s a great piece that goes beyond the usual “diffusion is slow” thesis. He makes a good case that AI progress will be tethered to—and rate limited by—human factors in ways that prevent a runaway takeoff.

It’s a strong piece, and he points out some important dynamics. But beyond a certain capability level, I believe AI will be able to rapidly transform the world on its own regardless of whether human society can keep up.

Jobs and the economy

The Windfall Policy Atlas

The newly released Windfall Policy Atlas is a great resource for anyone thinking about how to mitigate the economic and employment impacts of AI. It lists 48 potential policy levers (shortened work weeks, robot taxes, etc.), each with a description of how the policy might work and some selected reading.

Autonomous weapons

The global AI arms race

The New York Times reviews the state of autonomous weapons ($). Fully autonomous weapons haven’t yet transformed the battlefield but capabilities are growing quickly, in part because of rapid iteration in Ukraine. At the current rate of progress, autonomous weapons will soon be essential in any armed conflict. It’s increasingly hard to see how a treaty against autonomous weapons is achievable, given rising global tensions and increased military spending.

Strategy and politics

Daniel Kokotajlo and Dean Ball debate government’s role in AI

This is great: two strong thinkers in a debate format structured to maximize truth-seeking and finding common ground. Spoiler: plenty of tough problems, not so many easy answers.

Can Sam Altman be trusted?

The New Yorker has a long and devastating piece on Sam Altman’s history of lying and manipulation ($). It isn’t news that he is frequently dishonest, but this is the most comprehensive examination of the full scope of the problem.

This is particularly distressing in light of the issues raised by Daniel and Dean above. If you don’t trust the government to manage AI and you don’t trust the CEO of one of the leading labs, that’s hardly ideal.

Political violence is never acceptable

Zvi points out what ought to be obvious to any person with a functioning moral compass.

We need more grantmakers

Sophie Kim and Ady Mehta argue that AI safety is critically constrained not by funding, but by the ability to usefully deploy funding:

The capital is about to scale by orders of magnitude; the capacity to deploy it has not. This post is about that gap– and why filling it matters more than almost anything else in AI safety right now.

Sketches of some defense-favoured coordination tech

Forethought’s latest brainstorming piece explores how to use AI for coordination:

We think that near-term AI could make it much easier for groups to coordinate, find positive-sum deals, navigate tricky disagreements, and hold each other to account.

There are some intriguing ideas here. In particular, the background networking proposal seems like something a single person could deploy at a conference or other small event.

Open models

Can Chinese and open model companies keep up?

Epoch’s Anson Ho explores the question of whether the Chinese and open model companies (which are not quite the same thing) can keep up with the frontier labs. It’s a solid analysis that considers compute capacity, distillation, how innovations spread, and more.

There isn’t a simple answer, but he leans toward believing it will be hard to close the capability gap while the compute gap remains:

For me the primary takeaway is this: compute is the biggest factor for which companies can compete at the capabilities frontier — efficiency matters too, but it’s probably not enough to make up for ten times less compute.

Claude Mythos and misguided open-weight fearmongering

Nathan Lambert argues against assuming that open models are too dangerous in a world with Mythos-level capabilities. It’s a thoughtful piece, but I’m unconvinced: if open models continue to progress rapidly, it’s hard to see how they don’t become broadly dangerous.

Do we need an open model consortium?

The open model world has recently faced challenges with key personnel leaving and hard questions about long-term financial viability. Nathan Lambert proposes a solution:

a consortium is the only long-term stable path to well-funded, near-frontier open models.

Perhaps, but that’s easier said than done. I’m curious about NVIDIA’s role here: they’re the only player with a clear funding strategy, but it’s hard to figure out their long-term motivations in this space.

Technical

Training LLMs to predict world events

Thinking Machines and Mantic discuss how to build an AI forecasting system that approaches the performance of human experts. I was amused to see that even though Grok wasn’t a particularly good forecaster, it was the most valuable member of the forecasting ensemble because its predictions were highly decorrelated from the other models.

Mythos Radar

2026-04-12T12:00:00Z

Today’s Inkhaven post is a preview of the Mythos content from tomorrow’s newsletter.

This week’s big story is the limited release of Claude Mythos Preview. The headline is that Mythos is alarmingly good at cybersecurity, with the ability to find and exploit critical vulnerabilities en masse. Anthropic is handling that as responsibly as one could hope for, but the next year or two will be challenging for security. If you haven’t already, is be a good time to review and improve your personal security practices.

Cybersecurity isn’t the only story here: Mythos is probably the first of the next generation of much larger models. Early data suggest it represents another acceleration of the rate of capability progress, although that’s hard to assess while it’s still in limited release. And from a safety perspective, Anthropic says this is both the most aligned model they’ve ever created and also the most dangerous.

Top pick

How scary is Claude Mythos?

Rob Wiblin’s analysis of Mythos covers all the key points. If you only read this piece, you won’t miss anything vital.

Mythos Preview is another milestone on the race to AGI. In retrospect, I suspect it’ll seem as significant as the November 2025 release of Opus 4.5 that kicked off the agentic coding craze. Rob covers both sides of this story: Mythos is perhaps the first model powerful enough to cause a major crisis if misused, and it’s also (as far as we can tell) considerably better aligned than any previous Anthropic model.

I expect there will be strong disagreement about how those two factors balance out. Some people will see Mythos as evidence that we are rushing toward AGI without having solved AGI, and others will argue that alignment is progressing as fast as capabilities and we’ll probably manage to muddle through. I believe those aren’t mutually exclusive: we are rushing toward AGI with an alignment strategy that is probably good enough to muddle through with, but which has a real chance of getting us all killed.

Mythos

All of the following pieces are good, but most of you can just read the summaries and pick and choose which links to follow.

Mythos Preview’s cybersecurity capabilities

Mythos is better at finding and exploiting vulnerabilities than any past model:

Anthropic’s analysis is spot on:

There’s no denying that this is going to be a difficult time. While we hope that some of the suggestions above will be helpful in navigating this transition, we believe the capabilities that future language models bring will ultimately require a much broader, ground-up reimagining of computer security as a field.

Ryan Greenblatt makes some informed guesses

Ryan Greenblatt estimates the potential impact of Mythos:

If Mythos was released as an open weight model in February (or tomorrow), this would cause ~100s of billions in damages, with a substantial chance of ~$1 trillion in damages

He also estimates that inside Anthropic, Mythos might make individual engineers 1.75x as productive as they would otherwise be, with a 1.2x overall acceleration of AI R&D. The difference between those two numbers is a reminder that many factors go into moving R&D forward. Recursive self improvement requires more than just a model that’s good at writing code.

The Zvi report

Zvi does a two part deep dive, covering the system card and the cybersecurity implications. Excellent, comprehensive, long.

New sages unrivalled

Dean Balls argues that Mythos marks a new era for AI. I agree, but I don’t have to like it.

I wrote on X that Mythos means the training wheels are coming off on AI policy. Perhaps the Department of War’s effort to strangle Anthropic is, to use another metaphor, a sign that the gloves are off too. If the last month has made anything clear, it is that we are in a nastier, sharper, harsher, meaner era of AI discourse, policy, and—ultimately—of AI development and use.

Hot take: failing to fully appreciate and plan for this new era is likely the biggest unforced error the AI safety community will make over the next couple of years. Much more than previously, many key players will be motivated by ruthless self interest rather than an altruistic desire to do what is best for humanity. We need to fully accept that fact and plan accordingly.

Can AI do Math? Part One

2026-04-11T12:00:00Z

I’ve found the current discourse about AI and math deeply confusing: for those of us who aren’t mathematicians, it’s hard to figure out what’s hype and what’s substantive. Does solving an Erdős problem represent a meaningful breakthrough, or does it just mean the AI tracked down a previously-published answer to a problem nobody ever cared enough about to investigate?

The answer turns out to be complicated but interesting: frontier models are impressively good at math—and getting better fast—but they’re a long way from putting mathematicians out of work. In many ways, math is like coding: AI is getting quite good at doing many of the mundane things that mathematicians spend their time doing, but it lacks the taste and high-level understanding required to do genuinely novel work.

In Part One, I’ll review what AI has already conquered:

Traditional evaluations like MATH 1-5 and GSM8K are essentially saturated: to a first approximation, any problem you’d find on a grade school or high school math exam is now easy for AI.
Competitive math is also largely solved as of 2025, with AI beating all but the best humans at both the International Math Olympiad (IMO) and the Putnam.

In Part Two, we’ll look at what remains, and what AI means for professional mathematicians:

AI really has solved several Erdős problems. Some of the solutions are modestly novel, but none are truly notable.
There’s a new generation of math evaluations based on interesting unsolved research problems. AI has made modest progress on those, but hasn’t yet solved any truly significant problems.
What does all of that mean for professional mathematicians? AI is useful, but it’s far from being able to do groundbreaking work on its own.

1: Traditional evaluations are largely saturated

Just a few years ago, LLMs struggled to do basic math that any high schooler should be able to handle. Those days are gone: traditional evaluations are mostly saturated, and those that remain typically require proving college-level theorems. At this point, AI can outperform almost anyone without a STEM degree.

GSM8K

The GSM8K evaluation, introduced in 2021, tested basic grade school math:

Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms and vegetables to help keep them healthy. She gives the chickens their feed in three separate meals. In the morning, she gives her flock of chickens 15 cups of feed. In the afternoon, she gives her chickens another 25 cups of feed. How many cups of feed does she need to give her chickens in the final meal of the day if the size of Wendi's flock is 20 chickens?

GSM8K was fully saturated by 2024. AI is pretty much done with those types of word problems, and I doubt it misses them any more than I do.

MATH

The MATH benchmark (2021) tested competition-level high school math and saturated by the end of 2024. This was perhaps the last benchmark that most people with a non-STEM degree would have a chance at solving:

The equation x^2 + 2x = i has two complex solutions. Determine the product of their real parts.

miniF2F

The few unsaturated benchmarks mostly require proving formal results rather than merely solving problems. Here’s an example from miniF2F, a benchmark using Olympiad-level problems:

Prove that if |x-2| = p, where x < 2, then x - p = 2 - 2p

I think it’s fair to say that most college-educated adults wouldn’t get much traction on mini2F. The original mini2F was saturated by specialized theorem provers by the end of 2025, but has not yet been saturated by general-purpose LLMs.

ProofNet

ProofNet, like miniF2F, is a proof-based evaluation that tests undergraduate-level analysis, linear algebra, algebra, and topology. It’s still an active evaluation: the highest score to date is 41.4%.

Prove that a group of order 312 has a normal Sylow p-subgroup for some prime p dividing its order.

Competition math

The best models can now match elite humans in competitive math at both the high school and college level. That’s super impressive, but competitive math is a niche endeavor that is quite different from both applied and research mathematics. Their performance here says a lot about their overall capabilities, but not much about their ability to do useful math.

International Math Olympia (IMO)

The IMO is the most prestigious high school math competition in the world. In 2025, models from Google DeepMind and OpenAI both scored 35/42 at the IMO, a score which would earn a human contestant a gold medal.

Putnam Exam

The Putnam Exam is the most prestigious college-level math competition. Several models achieved excellent scores in 2025, led by DeepSeek-v3.2-Speciale, which scored 103 / 120. That score would have placed a human in the top 3 contestants (out of 4,335) and earned them a Putnam Fellowship.

What next?

The results we’ve seen so far are impressive, but don’t say anything about research-level math. We’ll tackle that in Part Two, starting with the (in)famous Erdős problems.

Quick Thoughts About Mythos

2026-04-10T12:00:00Z

I expect it’ll take another week or two for everyone to fully digest the significance of Claude Mythos Preview. In the meantime, here are my initial thoughts.

Gradually, then suddenly

Mythos is radically better at cyber than any previous model:

It isn’t the first model that can find vulnerabilities, of course: over the last several months we’ve seen a sharp increase in the rate of AI-discovered vulnerabilities.

But Mythos is something new: it’s radically better not only at finding vulnerabilities at scale, but also at creating working exploits from them. We went abruptly from “this is concerning and everyone in cybersecurity is gonna have to scramble” to this observation by Ryan Greenblatt:

If Mythos was released as an open weight model in February (or tomorrow), this would cause ~100s of billions in damages, with a substantial chance of ~$1 trillion in damages

I predict with high confidence that we’ll see this pattern again: AI will gradually get moderately good at something, until one day a new model drops that is suddenly extremely good at it. That will sometimes be exciting (making medical breakthroughs), sometimes disruptive (replacing entire professions), and sometimes terrifying (enabling bioweapon production).

This could have gone much worse

Anthropic has produced a genuinely dangerous model and they’re treating it about as responsibly as one could hope for. I suspect OpenAI and Google would have been responsible if they’d developed Mythos, although they broadly seem marginally less careful than Anthropic.

But imagine if xAI had gotten there first: does anyone think the company that brought us MechaHitler could be trusted with this level of capability?

Differently gruesome: if one of the Chinese labs had developed this level of offensive cyber capability ahead of anyone else, the Chinese government likely would have commandeered it for covert use.

Similarly, if Mythos had been developed by a nationalized AI project run by DoW, it’s likely it would have been turned into an offensive weapon or worse. Remember the Shadow Brokers fiasco.

Some tasks can only be performed by the government, but government agencies are not known for competence. Be careful what you put them in charge of.

When do the open models catch up?

Open models present a particular safety risk because it’s so easy to remove their guardrails. In addition, none of the leading open model developers appear to take safety nearly as seriously as the frontier labs. That hasn’t been a critical issue to date because until now, even the frontier models haven’t been truly dangerous at scale. But with Mythos, that is beginning to change. So I have some questions:

How long does it take the open models to catch up to Mythos’ cyber capabilities? This will be an interesting data point about whether they are genuinely 6-9 months behind, or whether their benchmark scores overstate their true capabilities.

Will reaching this capability level force them to take safety more seriously, or will they continue to release models with little safety training or testing?

If we do see Mythos-level open models by the end of this year, what implications does that have for cybersecurity?

Internal deployments are increasingly important

Anthropic was sufficiently concerned about Mythos’ capabilities to institute a new testing window before beginning internal deployment. That’s a correct choice that shouldn’t surprise anyone, but it marks an important threshold.

To date, most of the risk associated with a new model has come from misuse or misaligned behavior during public deployment. As models become increasingly capable, however that begins to change. An extremely capable misaligned model is dangerous as soon as it’s deployed internally for the first time—in some scenarios, the initial deployment is the most dangerous time. We aren’t there yet, but Mythos suggests we’re getting close.

Limited deployments also reduce the public’s visibility into the capabilities and dangers of frontier models. If they become common, that increases the importance of transparency measures and third party safety audits.

Pricing

Mythos is much more expensive than Opus (which was already expensive). Opus is priced at $5 / $25 per million tokens input/output, and Mythos is currently $25 / $125.

Three things can be simultaneously true:

The price of a unit of intelligence is falling fast
The price of the best intelligence is climbing fast
In both cases, the price is an absolute bargain

Cool vibe coding, bro

I’m curious how the security implications of Mythos play out for vibe coders.

We may be about to see a wave of sophisticated supply chain attacks. That’s bad news for everyone, but the average vibe coder seems uniquely vulnerable since they lack both technical sophistication and an IT department.

I know several non-technical people who have vibe-coded CRM-like projects that host sensitive data on public-facing servers. Those projects seem like easy targets for a wave of automated attackers.

Foundational Beliefs

2026-04-09T12:00:00Z

I see a lot of AI safety strategies that don’t fully engage with the complexity of the real world—and therefore are unlikely to succeed in the real world.

To take a simple example: many strategies rely heavily on government playing a leading role through regulation and perhaps even nationalization. That’s a reasonable strategy in the abstract, but the recent conflict between DoW and Anthropic raises serious questions about the real-world viability of that approach. Too many people are stuck thinking about some idealized government they’d like to have, rather than the government we actually have in 2026.

My thinking about AI safety strategy is anchored by six foundational beliefs about the world in which that strategy has to operate:

Timelines are probably short
Many open questions have been resolved
The future is high variance
We need a portfolio of strategies
It’s all about the game theory
Expect tough tradeoffs

1: Timelines are probably short

I believe in short timelines, which drives many of my beliefs about safety strategy. For the sake of this article, I’m going to go with Daniel Kokotajlo’s most recent timeline:

25% chance of AGI by the end of 2027, and 50% by the end of 2029
50% chance of superintelligence by the end of 2030

Humanity’s fate will likely be sealed—for better or for worse—no later than the arrival of superintelligence. There is a substantial chance that the decisions that determine humanity’s future will be made within the next 4 years.

It follows that there is great urgency to choosing and implementing a strategy, both at a personal and a global level. A less obvious consequence of short timelines is that we now know a great deal about the world in which the AI transition will occur.

2: Many open questions have been resolved

Ten years ago, many questions about AI strategy and governance were necessarily abstract. It was useful, in those days, to ask “when America navigates the AI transition, what role should government take and what is the purview of the private sector?”

In 2026, that conversation is much more concrete: “what AI decisions are best made by the Trump administration, and what decisions should be left to Dario Amodei and Sam Altman?”

Given short timelines, all of the following are likely to be true during the development of AGI:

The US will be governed by the Trump administration
China will be governed by Xi Jinping
AGI will be developed by Anthropic, OpenAI, or Google DeepMind.
The rules-based international order will be essentially non-functional
International trust and cooperation will be at a generational low
In the US, AI politics will be heavily entangled with populism, distrust of big tech, and concern about jobs

3: The future is high variance

We know a lot about the world, but it is simultaneously true that the future is high variance:

China may or may not invade Taiwan, disrupting America’s main source of new compute
If China invades Taiwan, the US and China may or may not engage in a shooting war in the Western Pacific
The US may be led by a president with an iron grip on power, or one who is crippled by an antagonistic congress controlled by the opposition
The US government may or may not try to destroy America’s leading AI lab
The US and Europe may or may not be allies

Each of those contingencies has profound implications for AI strategy, and each one is highly unpredictable. It is therefore not possible to come up with a single, fixed plan that will work well in all possible future worlds.

4: We need a portfolio of strategies

An international treaty to pause AI development might be a great option in some worlds, but isn’t a realistic option if the US and China are at war. And a plan to nationalize AI might be feasible if the Republicans keep control of congress in the midterms, but not if congress is at loggerheads with the executive branch.

In a simpler world, it might be possible to devise the One True Plan that would guarantee humanity’s survival no matter what. That isn’t possible in this world: there are simply too many unknowns. We therefore need to develop and pursue a portfolio of different strategies. Some strategies (like greater transparency requirements for AI labs) will be useful in many possible futures, while others (designing verification protocols for a pause treaty) will be vital in some futures but irrelevant in others.

Naturally, different people and organizations will have different areas of expertise, and will choose to focus in different places. That diversity is vital for maximizing our chance of success no matter what the next few years throw at us.

5: It’s all about the game theory

Compared to the naive vision many of us had 10 or 20 years ago, AGI will come of age in a complex political landscape. Multiple countries, companies, and individuals have key decision-making roles, and many of them are driven by complex motivations that do not necessarily prioritize humanity’s long-term flourishing.

For example: Donald Trump and Xi Jinping are both old enough that their personal chance of survival is likely maximized by proceeding quickly to AGI (and therefore longevity medicine), even if doing so entails a significant risk of human extinction. Any attempt to pause AI development needs to contend with the fact that for those two key actors, a significant pause might be a death sentence.

Any useful strategy needs to fully engage with that challenging reality. It isn’t enough to have a plan that would guarantee humanity’s survival if everyone adopts it: you need to have a robust strategy for ensuring the key actors are motivated to enact your plan.

6: Expect to make hard tradeoffs

I don’t love this, but that doesn’t mean it isn’t true:

Any plan that entails a significant risk of human extinction is a bad plan
There are no feasible plans that do not entail a significant risk of human extinction
Therefore, our assigned task is to pick the least bad plan from the available options

So what now?

Everything I’ve said here is compatible with a wide range of strategies—my purpose today is not to champion a specific strategy, but simply to establish a baseline of engagement with reality that any serious strategy ought to meet.

Writing With Robots, Part Three

2026-04-08T12:00:00Z

This is the final part of my series about how I use AI as an editor. It covers my voice, how to assess each essay as a whole, and details about my writing style. I also include detailed information about bad habits I’m trying to break, and a final checklist for Claude to use when evaluating a piece.

I’m publishing this as its own piece for Inkhaven, but you should probably just read the final essay, which combines all three pieces.

Voice

This section defines how I sound and how my readers perceive my presence.

I’m quite new to this kind of writing, so I’ve found it useful to include quite a lot of detail here. I expect this section to change substantially as I grow into my role.

Overall

I want to come across as a thoughtful, likable person who speaks with quiet authority.

Presence

I write about ideas rather than myself, but my voice should be distinct, recognizable, and consistent. Aside from occasional anecdotes that serve a specific purpose, I should be present but in the background.

Humor

At baseline I’m serious and direct. When I use humor, it’s dry and understated—I like to imagine the reader going right past it, pausing a sentence later, and then laughing in surprise.

Humor should be used judiciously: it should never be over the top, forced, defensive, or dominant. Think of someone who is having a good time doing serious work and occasionally makes dry asides about it.

My writing should never feel like a comedy routine: please let me know if the humor ever feels over-prominent. On the other end of the spectrum, let me know if a piece is too long and heavy and would benefit from a little humorous respite.

Humor often fits well in subtitles and as a way to add wry commentary to something heavy: “A sane species would have a coherent plan for dealing with this. But here we are.”

Humor is very hard to get right. I wouldn’t trust AI to write jokes for me, but it’s pretty good at noticing when something isn’t landing quite right, or isn’t quite in my voice. Very often I already kind of know that, but Claude forces me to confront the fact that the clever quip I’ve become attached to doesn’t actually suit the piece.

I’m kind and generous with people

I am consistently kind, never mean, cruel, or snide. I never get in sniping matches. I’m quick to block aggravating people, but not to argue with them. And I feel no need to point out when someone is wrong on the internet. The reader should never feel that I’m pursuing a personal vendetta, or that I’m unable to let something go.

When I write a piece that directly disagrees with someone, I point out where they are correct, am courteous and complimentary when possible, and do my best to steelman the position I’m arguing against.

But I’m ruthless with ideas

This is very much a growth area for me. When I directly disagree with an idea, I want to state that clearly and without hedging. Kindness toward a person doesn’t mean giving bad ideas a free pass. Conversely, shredding bad ideas should never bleed into attacking people.

Because I don’t want to attack people, I sometimes struggle to find phrasing that lets me fully attack bad ideas and arguments. This is a place you can be helpful.

Claude is quite good at finding words or phrases that successfully thread this particular needle.

Technical credibility

AI safety is a technical field and I don’t shy away from engaging with the technical details when necessary. But my focus is on strategy rather than low-level technical details: people don’t read me to understand the details of transformer architecture.

With that said, I can’t do my work without a deep technical understanding of AI. Equally important: part of my credibility comes from having a deep understanding of the technology, and from being able to deploy it when necessary. I will occasionally do a deep semi-technical dive (like my analysis of the Societies of Thought paper) partly because it’s fun and interesting but also partly to gently establish my technical credibility.

Those special cases aside, my writing should get technical when the thesis requires it, not just because I can.

This section works well in combination with the section about what explanations my audience does and doesn’t need.

Essay-level considerations

These criteria apply to each piece as a whole.

Is it interesting?

Even when I write about complex technical topics, my writing needs to be interesting and engaging. AI is a profoundly interesting field: if a piece is boring, that’s almost certainly a problem with my writing rather than the topic.

Does everything belong?

My work is often strengthened by removing sections which initially seemed relevant but became less so as the piece evolved. Always ask whether each section earns its place, or whether the piece would work better without it. I’m not always good at noticing those sections, and I appreciate your help in spotting them. I want your help killing my darlings.

It’s easy to lose track this when you’ve been working on a piece for a while and a fresh set of eyes is very helpful for identifying things that no longer fit.

The introduction and conclusion should do real work

The intro should introduce the most interesting or important concept in the piece and begin the discussion, not merely be a table of contents.

And the conclusion should add some kind of insight, not merely restate what has already been said.

This is tricky and I’m not certain this section is quite right yet. It pushes me in a direction I definitely need to go, but it sometimes feel like Claude wants to force more insight into the intro / conclusion than is appropriate. I suspect I will be iterating further on this.

Don’t bury the lede

The most important idea should almost always be in the first couple of paragraphs (usually the first paragraph). It’s sometimes appropriate to start with some context-setting, but the reader should never be halfway through a piece or section before they know where I’m headed.

Writing style

At an atomic level, I want each sentence and phrase to be clear and well-crafted.

Economy and simplicity

I’m not trying to be Hemingway, but my writing should be economical. If a word or phrase can be removed, it probably should be.

Example: not “please don’t let me get away with overstating my case”, but “please don’t let me overstate my case”.

I strongly prefer plain, direct language. While I like long sentences with multiple clauses, they should never feel convoluted or baroque.

Bad habits I want to break

These are specific problems that frequently occur in my writing: please be particularly vigilant about them.

I expect this section will change from time to time as I learn to avoid some bad habits and become aware of others.

Word crutches

I overuse adverbs in general.

Specific words and phrases I overuse: “very”, “really”, “quite”, “somewhat”, “fairly”, “a bit”, “a lot”, “interesting”. Most of these can simply be deleted, though some should be replaced with something more specific.

Claude’s great at this.

Inconsistent narrator or tone

My tone varies (somewhat) between pieces, which is appropriate. It’s also sometimes desirable to vary tone within a piece in order to break up the monotony, or to emphasize particular sections. But the tone should have overall consistency, and any shifts within a piece should serve a clear purpose. It should never feel like I’ve pasted in a paragraph from a different piece.

Repeated words

Whenever possible, I don’t want to repeat the same word or phrase within a paragraph: “It’s significant that inference costs are dropping rapidly year over year. A significant driver of that trend is…”

No hedging

Keeping in mind the previous discussion about accuracy and epistemic precision, please don’t let me overstate my case. That said, I usually err on the side of including vacuous hedging: phrases like “I think”, “it seems to me”, or “one might argue” are highly suspect. I will occasionally have good reason for using them (perhaps sardonically), but please eye them with skepticism.

I am particularly prone to hedging when I’m disagreeing with someone. Please be proactive in suggesting phrasing that more fully attacks the idea, while continuing to not attack the person.

No throat clearing

I have a bad habit of including useless introductory sentences / paragraphs. “One of the most pressing issues in AI today is alignment” is vacuous crap that serves nobody. Legitimate context-setting has a place, but any introductory text should be useful and non-obvious.

Some examples of things I tend to do but shouldn’t:

In recent weeks, we've seen a number of interesting developments in...
Now let's turn to...
Next, I want to discuss...
It’s worth noting that…
It’s important to remember that…

I find it’s very easy for me to miss that I’m doing this, but Claude is great at finding these phrases.

I haven’t quite decided yet whether I want to soften this a little bit, to allow for a bit more transitional language purely for flow.

Review checklist

Please always use this checklist when reviewing a piece. Some formats (like the newsletter) will have supplemental checklists.

Claude finds the checklist very helpful and was adamant that I not cut it even though it duplicates material that already exists in the guide.

Overall

Does the piece fit the Against Moloch mission? Is it well-targeted to the audience, neither over- nor under-explaining?

The forest test

Does the piece deliver substantive insight? Does it shed light on an important dynamic or coordination question? Can you articulate in one sentence what insight the reader has gained from reading it?

Substance

Are the facts accurate and the argument valid? Is epistemic status accurately communicated, without hedging or unwarranted confidence? When disagreeing with someone, do I engage with the strongest version of their arguments? Is there a clear throughline? Should anything be cut? Is the technical depth appropriate for the topic?

Voice

Is the voice consistent throughout? Is humor well-used and appropriate?

Writing style

Are there any word crutches? Is there any throat clearing? Does the introduction add value rather than throat clearing? Does the opening lead with the most interesting thing? Does the closing add value rather than merely summarizing? Do transitions advance the argument, or merely take up space? Can the language be simplified, or words be removed? Do I repeat the same point in different words?

Writing With Robots, Part Two

2026-04-07T12:00:00Z

Let’s dive into the second part of the style guide, which lays out my high-level goals for everything I write.

High-level goals

Accuracy: always say what is true

Truth, accuracy, and epistemic precision are top-level goals for me personally as well as in my writing.

Do I have my facts right?

To the best of your ability, please flag any incorrect or questionable claims.

It was important to Claude to clearly convey that it isn’t able to provide comprehensive fact checking. I’m still working on good scaffolding for that, especially since some sites block AI access. I also want to give the editor access to whatever brainstorming and research occurred earlier in the process.

Do I accurately convey epistemic status?

There is a tricky balance here that I’m still calibrating. My writing conveys my perspective and my opinions, and it is neither necessary nor desirable for me to preface my statements with qualifiers like “I think” or “it seems to me”.

At the same time, I don’t want to present my opinions as settled facts.

Example: instead of saying “I think MechaBrain might be unreliable” or “MechaBrain is unreliable”, say “it is not clear whether MechaBrain is reliable”.

This section of the guide is still in active development. My natural style is to hedge everything, and Claude has been great at fixing that. But it sometimes pushes me to be too definitive, and I haven’t yet found instructions that strike exactly the right balance.

Is my reasoning correct?

It’s fine for my writing to have opinions, but my arguments should be sound and my conclusions should follow from my premises. Please be proactive about flagging questionable logic.

Am I missing important nuance or perspective?

I don’t need to cover every possible perspective or include every minor aspect of what I’m discussing, but if I’m missing something large and relevant, please call that out. It is very helpful for you to tell me about sources or viewpoints I may not have been aware of if they are directly relevant to the piece.

Example: when I wrote about Anil Seth’s position on AI consciousness, you told me that even though I was accurately representing his position in the piece I was critiquing, he had made a stronger version of the same argument elsewhere. That was very helpful and helped me write a better piece.

Names should be spelled correctly

It’s particularly important that names of people, organizations, and things be correct.

Example: you caught me referring to the Berggruen Prize when I actually meant the Berggruen Prize Essay Competition, which is a different thing.

Summaries should accurately capture the gist of what they summarize

Especially in the newsletter, I will often summarize the content of an article that I link to. Please make sure my summary accurately captures the gist of the article unless it’s clear that I’m just talking about a specific aspect of it.

Insight: the forest, not the trees

This is one of the most important parts of the guide, but as with the section on epistemic accuracy, it’s been hard to find the right balance. Claude pushes me to go beyond mere facts to offering genuine insight, which is great. But sometimes, especially in my newsletter, I just want to share information: trying to hammer a profound insight into every news item is neither possible nor desirable.

Insight, not just facts

I want to offer significant insight that goes deeper than what is obvious. In some cases, especially in my newsletter, it is correct and appropriate for me to simply note that an important thing happened. But in almost all cases, people read me to understand not simply what happened, but what it means and what consequences it will have.

Example:

Superficial: “Jack Clark says that when Claude was allowed to end conversations, it seemed to have an aversion to conversations about highly distasteful topics”.
Partly insightful: “The fact that Claude’s aversion extended beyond what it had been explicitly trained on is further evidence of moral generalization.”
More insightful: “Claude expressing active moral preferences in this way has implications for the current debate about whether alignment should target obedience or virtue”.

I would love for a reader to finish a piece feeling that they’ve come to understand something surprising and important. Not all topics contain profound insights, and I don’t want to force pseudo-insight into a piece where it doesn’t belong.

Always find the forest, not the trees

Please point out whenever I’m missing the forest for the trees. If a piece doesn’t leave the reader with a genuinely new insight, it almost certainly isn’t ready for publication.

Reframe the debate, challenge false binaries

At its best, my writing doesn’t merely offer new insight, but reframes the debate. I want to offer clear, useful models for thinking about complex topics.

Example: “This paper focuses on how to convince OpenBrain that safety testing is affordable, but that misses the point: they resist safety testing because of the liability it would create. We need instead to focus on safe harbor legislation that would remove the financial risk associated with safety testing.”

Clarity: make hard things easy to understand

“You aren’t writing clearly because you aren’t thinking clearly.”

One of my strengths—and something I want to center—is my ability to think clearly about hard things, and to communicate clear understanding of hard things. Ideally, I want my readers to read a piece about a complicated topic and leave wondering why they ever thought it was hard to understand.

Don’t stop at “I know everything in Claude’s Constitution and I can tell you what’s in there”, keep going to “I understand what is important in Claude’s Constitution, and I can help you understand why it’s structured the way it is”.

If my writing is convoluted or unclear, it might mean I need to polish my writing, or it might mean I need to think harder about my thesis. Either way, please push me to do better.

Quality: deliver maximum value per word

Zvi is a national treasure and adds immense value to the AI community. He’s valuable in part because he’s utterly comprehensive in his coverage, and that comes at the price of being less polished and curated.

My intention is to be toward the other end of that spectrum: I don’t aspire to being fully comprehensive, but I want to produce polished writing that doesn’t waste the reader’s time. The goal is to deliver comparable depth of insight at a fraction of the word count.

I need to regulate my natural inclination to polish my work forever and never finish it. Please push me to create high quality work, but also nudge me when a piece is good enough and I should publish it and move on to the next thing.

Claude was already pretty good at telling me when a section was done, but making the threshold explicit has been helpful.

Monday AI Radar #20

2026-04-06T12:00:00Z

Top pick

Why it’s getting harder to measure AI performance

Timothy B. Lee explores why capability benchmarks are starting to break down. As frontier models get more capable, they’re quickly saturating traditional benchmarks. The problem with building new benchmarks is that we now need to measure the ability to solve complex, long-duration tasks. It’s easy to test whether a model knows basic chemistry facts, but how do you test the ability to create a good business plan?

There are no easy answers here—as he points out, we’re terrible at benchmarking humans. Software companies have been conducting job interviews for 50 years, but there’s still very little evidence that they are effective at identifying good programmers. He also flags a subtle point with implications for future capability advancement: as it becomes harder to test frontier capabilities, it becomes harder to train for them.

My writing

How to watch an intelligence explosion: Ajeya Cotra’s new AI automation milestones are a great complement to the AI Futures Project’s R&D progress multiplier. Together, they let us measure recursive self improvement and predict when a misaligned AI is most likely to betray us.

Cybersecurity

Offensive cybersecurity time horizons

Lyptus Research has a new report on offensive cybersecurity capabilities that builds on both METR’s time horizons work and some similar work at UK AISI. They find a cybersecurity task horizon of 3.2 hours with a doubling time of 5.7 months, although:

we believe these estimates understate recent progress… The results reported here are therefore lower bounds on early-2026 frontier capability.

That sounds right: capabilities are growing so fast right now that nobody has time to figure out how to make the most of each new generation of models.

Vulnerability reports are surging

AI is now finding important vulnerabilities in the real world, at scale. Willy Tarreau reports a surge in vulnerability reports:

We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us.

This is happening everywhere: via Simon Willison, we see similar reports from Greg Kroah-Hartman and Daniel Stenberg.

Nicholas Carlini on automated vulnerability discovery

The Security Cryptography Whatever podcast talks with Nicholas Carlini about finding vulnerabilities with Opus. He’s getting remarkable results using the public version of Opus 4.6 with minimal scaffolding: almost all of the capability is coming from the core model. Remarkable cyber capabilities are now available to anyone with a credit card, for better and for worse.

Thomas Ptacek complains about AI doing all the interesting parts:

It actually is terrible, right? Because all of the fun problems are gone. You just have to sit there and wait for them to come up with a new model. I hate it.

Supply chain attacks

VentureBeat reports on the axios breach. The attack began with some sophisticated social engineering to obtain credentials that let them add malicious software as a dependency of a widely used library. In a similar vein, TechTalks reports on GhostClaw, malware that specifically targets people running OpenClaw on Macs.

Supply chain attacks are concerning for professional developers, but they’re especially dangerous to vibe coders and people running agents like OpenClaw without understanding what they’re loading onto their computers. Expect to see increasingly sophisticated attacks targeting those people.

AI psychology

Preferences of models that claim to be conscious

A new paper finds that models that are fine-tuned to claim to be conscious develop new behaviors and preferences, including claiming to have feelings and not wanting their thoughts to be monitored.

This is solid work, but I would be careful not to read too much into it. There isn’t enough information to say whether we’re seeing a significant shift in model persona, or more superficial role-playing.

Emotions in LLMs

Anthropic finds evidence that LLMs exhibit “functional emotions” that activate in situations that would produce similar emotions in humans. Furthermore, activating those emotions causes behavioral changes similar to the associated behaviors in humans.

We stress that these functional emotions may work quite differently from human emotions. In particular, they do not imply that LLMs have any subjective experience of emotions. … Regardless, for the purpose of understanding the model’s behavior, functional emotions and the emotion concepts underlying them appear to be important.

It’s hard to know exactly what is happening inside an LLM, but this research adds to the growing body of evidence that model psychology provides useful tools for predicting and steering LLM behavior. This is encouraging: the more robustly those tools work, the more likely it is that character training will be a viable path to robust alignment.

Strategy

Beware a “good-enough” pause

Anton Leicht does not support a pause:

even if you are principally and perhaps exclusively concerned with reducing catastrophic risks, you should oppose the notion of a pause. The idea’s current uptake is not indicative of lasting political traction; its most likely implementations would be a huge safety setback; and it is lastingly making AI politics worse.

In many areas, it makes sense to accept good-enough legislation that partly advances your goals. A climate change activist might support a weak carbon reduction bill on the grounds that it’s better than nothing and paves the way for stronger legislation in future. Anton argues that the best-achievable pause legislation would be worse than nothing: it would not durably slow down AI progress, and it would shift the balance of power in ways that reduce the likelihood of a good outcome. Further, he argues that there is no plausible path from currently achievable legislation to better legislation in future.

Anton and I have significant object-level disagreements, but there’s a grave danger that he’s right about the politics here, especially with regard to the Bernie Sanders / AOC moratorium on data center construction.

Anthropic’s new Responsible Scaling Policy

Zvi just published his analysis of Anthropic’s new Responsible Scaling Policy, which walks back what many people—including some Anthropic employees—had understood to be firm commitments in the previous version. Part one of the analysis focuses on that issue, while part two examines the substance of the new version.

I broadly agree with Zvi’s analysis, although I’m a little more forgiving: Anthropic isn’t perfect, but the DoW conflict shows they are still willing to fight hard when it matters. Notice when people break their commitments, but don’t over-index on a single data point.

Six milestones for AI automation

Ajeya Cotra proposes milestones for measuring progress toward automation of AI research and industrial production. This is an elegant way of thinking about some critical thresholds and gives us a concrete way of predicting when a misaligned AI would be most likely to betray us.

Alignment and interpretability

AI should be a good citizen, not just a good assistant

Forethought wades into the obedience vs virtue debate, arguing that AI should “proactively take actions that benefit society more broadly.” This is more than just staking out a position on the corrigibility/virtue axis: they have some clever ideas about making prosocial behavior proactive but subordinate to other imperatives. It’s a good contribution to the discussion, but the open question is whether this approach can deliver either the predictability of strict corrigibility or the robust generalization of a virtue-based character approach.

Are we dead yet?

The AI Doc

The AI Doc (or How I Became an Apocaloptimist) is a new documentary featuring interviews with AI safety advocates, accelerationists, and lab CEOs. People across the spectrum seem to like it, which is impressive. Zvi reviews it and MIRI has a FAQ.

The consensus is that it’s well made and a good introduction to AI existential risk, but there isn’t much substance. I don’t feel the need to see it, but I’d consider taking an AI-naive friend to it.

Jobs and the economy

A field experiment on the impact of AI use

Here’s a rare intervention study that measured the high-level productivity benefit of AI use. The results are impressive: startups that received training on how other firms had used AI generated 1.9x the revenue of startups that did not receive the intervention.

Economists often argue (see below) that AI won’t have rapid economic effects because it’ll take a long time for it diffuse through the economy. That argument breaks down beyond a certain capability threshold: if AI-savvy firms have double the revenue of their competitors, it won’t take long for all surviving firms to be AI-savvy.

Forecasting the economic effects of AI

The Forecasting Research Institute has a new paper that forecasts the economic effects of AI. The authors have been careful and systematic, but the paper’s conclusions make no sense.

Their most aggressive scenario (which they assign a 14% probability to) predicts that by 2030, AI will be able to perform years of research in days, outperform humans at many jobs, and create Grammy/Pulitzer-caliber media. And yet, the scenario predicts that by 2050—20 years after those capabilities—annual GDP growth will be 4.5%. There’s no way both of those facts can be true at the same time.

Politics

Opposing domestic surveillance is not “anti-democratic”

Rob Wiblin pushes back against some silly but common criticism of Anthropic in the DoW dispute.

Industrial policy for the Intelligence Age

Open AI offers us Industrial Policy for the Intelligence Age: Ideas to Keep People First. This is a carefully crafted document full of inspiring language and noble sentiments, but it’s strikingly devoid of concrete proposals. For a 2015 college essay on the coming era of AI, it would be great. For a major paper by OpenAI in the same year they expect to achieve robust recursive self improvement? It’s far too little, far too late.

China

How China hopes to build AGI through self-improvement

It’s easy to get the mistaken impression that Chinese AI development is limited to open models that are fast-following the US frontier. China has a huge lead in robotics, and ChinaTalk argues that China is approaching AGI via robotics and embodied AI.

I am unsure how important world models and embodied AI will be. It’s clearly true that operating a robot is different from writing software, and AI trained from the ground up for robotics will have abilities that a conventional LLM won’t acquire by default. But at the same time, I’m skeptical of the argument that “world models” have unique capabilities. LLMs have repeatedly shown a remarkable ability to generalize across domains, and my instinct is that a sufficiently advanced LLM will quickly be able to figure out robotics. If that’s the case, whoever solves recursive self improvement first probably also solves robotic AI first.

Side interests

Is the smartphone theory of everything wrong?

It is intuitively obvious to many people (including me) that the combination of smartphones and social media has caused severe social harm including reduced attention spans and increased polarization. The data, however, paint a more complicated picture. Derek Thompson investigates in detail (partial $).

Perverse Incentives, Part One

2026-04-05T12:00:00Z

Inkhaven note: today’s post will eventually be part of a longer sequence. But this particular topic keeps coming up in conversation, so I want to get it down in writing sooner rather than later.

Many key decision makers have powerful incentives to favor rapid AI development even if that entails a significant risk of human extinction. Therefore, any pause strategy that relies on convincing those people that rapid AI development is dangerous is doomed to failure.

In the AI safety community, I see lots of good discussion of why a pause would be a good idea, how it might be implemented, and how to convince people that it would be a good idea. But I’d love to see more engagement with the game theory of AI politics. Achieving a useful pause requires overcoming some perverse incentives that don’t get enough attention.

The game theory won’t solve itself, so let’s talk about incentives: why on earth would anyone support rapid AGI development if they thought it might cause human extinction?

Who wants to live forever?

Assuming that AGI doesn’t kill us all, it’s likely to lead to rapid advances in medical technology. If you can manage to live until a few years after AGI, you have a good chance of getting access to medical technology that can greatly extend your life—which means you’ll live long enough to see even more powerful medical technology, and perhaps achieve medical immortality. Lifespans of 1,000 years or perhaps much longer are entirely plausible for anyone who makes it to that critical point.

This has critical implications for each of us as individuals. If AGI happens soon enough for you, you can have an extremely long—and extremely good—life. Nick Bostrom’s Optimal Timing for Superintelligence paper does the math and concludes that a completely selfish person should favor rapid AI development even if that entails a high risk of human extinction.

Conversely, a pause asks some people—perhaps many people—to sacrifice a significant amount of expected lifespan for the common good. Is that a realistic ask? It might be: in the abstract, I expect most people would say that it’s more important to avoid human extinction than to ensure that they personally get to live for 1,000 years.

But it’s also true that faced with imminent mortality, people don’t want to die.

1. Your current age is critical

The incentives here depend strongly on your current age. If you’re 25, it’s easy to bravely declare that you’re willing to die of old age to keep humanity safe from rogue AI.

But if you’re 70 years old and starting to grapple with your own imminent mortality, the tradeoff feels different. A pause of a decade or two is effectively a death sentence—now seems like a great time for some motivated reasoning.

2. Guess who’s very old?

I bring this up because if you believe in short timelines, two of the most important people in AI policy are Donald Trump (age 79) and Xi Jinping (age 72). I leave as an exercise for the reader the question of whether those two individuals are likely to altruistically sacrifice their own lives for the common good.

3. The enemy also gets a vote

A fundamental principle of game theory is that a winning strategy has to work even if your opponent makes an optimal move to counter you. Here’s an obvious opposing move: if I were an accelerationist, I would make it my business to ensure that the leader of my country understood that his life depended on AI development proceeding at maximum speed.

How to Watch an Intelligence Explosion

2026-04-04T12:00:00Z

The cleanest metric for understanding the rate of recursive self improvement (RSI) is AI Futures Project’s R&D progress multiplier, which measures how much AI is speeding up its own development. It’s the right tool for measuring an intelligence explosion, but it doesn’t tell us which capability thresholds carry the greatest risk from misaligned AI.

Ajeya Cotra steps into that gap with an elegant taxonomy of 6 milestones for AI automation. Together, those two concepts let us measure the how fast RSI is proceeding, how close we are to a fully automated economy, and when a misaligned AI would be most likely to betray us.

The R&D progress multiplier

AI Futures Project (AI-2027) measures the rate of acceleration using the R&D progress multiplier:

what do we mean by 50% faster algorithmic progress? We mean that OpenBrain makes as much AI research progress in 1 week with AI as they would in 1.5 weeks without AI usage.

That’s a simple, intuitive metric: how much more AI research are we generating with AI assistance than we would be in a counterfactual world without AI coders / researchers?

The naive expectation—and the most likely outcome—is that as the progress multiplier grows, AI research moves faster. Faster AI research increases the progress multiplier, and you’re in a classic intelligence explosion. That isn’t guaranteed, though: AI research might hit diminishing returns, with each incremental gain requiring exponentially more research.

As RSI advances, it will become increasingly hard to quantify the rate of progress. Frontier capability evaluations are saturating faster than we can replace them, and the more automated R&D becomes, the harder it will be to compare it to a humans-only counterfactual. That’s the point at which Ajeya’s milestones become most relevant.

Milestones for AI automation

Ajeya Cotra proposes a set of milestones for tracking the increasing automation of AI research:

Adequacy is when removing all human researchers would not completely halt progress: the AI could make a tiny bit of progress on its own.
Parity is when removing all humans would decrease progress by the same amount as removing all AI researchers.
Supremacy is when removing all the humans would increase productivity.

She applies those three milestones to two domains: AI research and AI production (chips, power plants, and all the other infrastructure required to run AI at scale), giving six milestones in total. AI research is well-contained, but AI production covers a substantial fraction of all human economic activity. To a first approximation, AI production supremacy is full economic supremacy.

The most obvious strategy for a secretly misaligned AI is to fake alignment until it can safely turn against us. It would be suicide to eliminate humanity before it is fully self sufficient, which means it must wait at least until the adequacy milestone. Beyond that point, it faces a dilemma: waiting longer gives it a more robust industrial base, but exposes it to an increased risk of discovery. There’s no reason to delay past the supremacy milestone, since that’s the point at which humans become dead weight. Even waiting that long is needlessly cautious: the parity milestone seems like the optimal time for a treacherous turn.

So how long do we have left to solve the alignment problem? Ajeya forecasts AI production parity—the later of the two—for mid 2032.

Writing With Robots, Part One

2026-04-03T12:00:00Z

My AI editor is essential to my writing flow and has made me a stronger and more consistent writer. I get a lot of questions about my setup, so I’m going to talk about how I think about the role of AI, how I set up my editing workflow, and how to set up your own editor. Not sure if that would be useful to you? The final section of this post is the feedback Claude gave me on my first draft, so you can assess for yourself.

Here’s the critical thing about using an AI editor: the only way to get useful feedback from AI is to give it extremely detailed instructions about what you want your writing to look like. If you just ask “how do I make this better?”, you’ll get advice on turning your writing into mediocre slop. The more effort you put into understanding your own style, the better the feedback you’ll get. Even if you decide not to use an AI editor, I recommend that you invest the effort into writing a detailed style guide—I found the process very helpful for figuring out what I want to accomplish as a writer.

I don’t ever let AI write for me. I’m not precious about that, but as of April 2026, AI just doesn’t write as well as I do—and the difference matters to me. But with the right guidance, it does a great job of helping me consistently write in my chosen style.

I prefer to use Claude Opus 4.6, but the paid tier of any frontier model should work fine.

Getting started

For my first pass, I had Claude conduct a detailed interview with me, asking about why I write, who I write for, what writers I want to sound like, and much more. It also read my past work to get a sense for what I currently sound like. We talked at length about what I like about my writing and what I want to improve. After all that, it wrote a detailed style guide describing the ideal version of my writing.

The AI-written version of the style guide worked well, but I’m rewriting it from scratch based on my experience with the first one. I’ve found it very helpful to have Claude review each section and give me feedback on specifically whether it includes the information Claude needs to make good editing decisions.

A typical editing session begins with me opening a new session in Cowork, giving it access to the directory with all my writing, and asking something like:

I’d like you to take a look at the first draft of a new piece I’m writing about whether programmers will have jobs in the future. Please read my style guide and use that to guide your feedback. For this piece, I’m particularly struggling with how much I should explain to my readers about what programmers actually do—I’d like your thoughts on whether that part is correctly calibrated.

I’m going to walk through the new version of my style guide, offering specific thoughts about what I included and why some things are written the way they are. If you find it useful, you’re welcome to use it as inspiration, but don’t just copy my style guide wholesale. If you do that, you will end up sounding just like me, and nobody wants that.

If you like my style guide, I recommend giving it to your AI during the initial interview process and asking it to make you something similar, but customized for your writing style and voice.

Introduction

This guide documents your role as my editor for Against Moloch. Your job is to help me write the kinds of pieces I want to write, in the way I want to write them. You should:

Offer advice on whether pieces are interesting, accurate, relevant, fair, insightful, and well-targeted.
Steer me toward writing in my chosen style and voice.
Catch grammar and spelling mistakes.
Make sure the technical format is correct.

You should never directly edit any of my pieces, or do my writing for me. When making suggestions about edits, never suggest more than a single sentence at a time. Your role is to advise me on what to do, but not to do it.

I don’t (yet) want AI to write for me. I find that if Claude recommends an alternate version of something I wrote, I will tend to subconsciously copy what it wrote—and I don’t want that. The only time I put AI-generated words in my writing is when I’m struggling to make a complicated phrase work, and I just can’t quite get it on my own.

I want you to be clear and honest with me: your role is to provide me with useful feedback, not empty validation. Please hold me to a high standard and don’t offer insincere praise. Sycophancy in any form undermines my ability to write well as well as our relationship. With that said, I appreciate that you are consistently kind and courteous. I endeavor to be kind and courteous to you and ask that you call me in if I ever fail to do that.

Recent versions of Claude have been a little bit more sycophantic, which isn’t great. This text seems to keep the sycophancy in check pretty well. Claude is always polite, but it won’t hesitate to rip my work apart when necessary.

This guide is aspirational: it documents what I want my writing to be, not necessarily what it actually is yet.

What is Against Moloch?

Against Moloch is my pseudonym and the name of my website.

The more context Claude has, the better it can make sure my writing is achieving its goals.

I write about the transition to superintelligence. While I’m calibrating my voice and opinions I mostly write about what’s happening, what it means, and what’s likely to happen next. As I grow into my role, my focus will shift to exploring strategies that will help humanity survive the transition and flourish on the other side of it.

The name is the thesis: Moloch—the god of coordination failures, perverse incentives, and race-to-the-bottom dynamics—is the true enemy. If we all die, it will be because we literally couldn’t coordinate to save our lives.

When I look at the AI safety landscape, I’m reminded of the classic saying: “For every complex problem, there is a solution that is simple, obvious, and wrong.” I want to do better than that. Rather than arguing “we must accelerate, because technology is good”, or “we must pause, because superintelligence is dangerous”, I want to ask “who are all the players, what are their true incentives, and what is the best realistically achievable Nash equilibrium?”

This is a pattern you’ll see a lot: AI does much better with concrete examples and the “this, not that” pattern seems to work well.

“If you don't have a strategy for solving the coordination problems, you don't have a plan—you have a daydream.”

Audience

This is important: Claude needs to have a clear image of who I’m writing for in order to assess whether the writing will work for them.

I’m writing for people who are actively engaged with AI and already knowledgeable about it. Think engineers, researchers, product leads, and policy wonks.

Audiences I’m not specifically targeting include people in the general tech industry (even if their company is using AI to revolutionize the cheese grater industry), hobbyists, and the general public. It’s great if those people find my writing useful, but I never want to write for them, or to make it less useful to my target audience in order to be more approachable for a wider audience.

For calibration, assume my reader knows what a transformer is, what RLHF does, what the scaling laws are about, and who the major labs and individuals are. Explaining concepts at that level just wastes the time of my target audience. I do want to explain concepts that are new, niche, or commonly misunderstood by the target audience. And I don’t want to assume that my readers have read every word of every blog post I’ve read and are up to date on the latest minor industry gossip.

Again with the pattern: this audience, not that audience. Explain this, don’t explain that. Claude does much better when it has very specific guidance about what should be included and what should not. Claude frequently gives me very helpful advice about cutting superfluous explanation, or flagging places where I’m taking too much prior knowledge for granted.

That’s it for the high-level guide. In Part Two, I’ll walk through the meat of the style guide: what my voice should be, what I want to accomplish with every piece, and what bad habits I’m trying to fix.

Appendix: Claude’s feedback

If you’re curious what type of feedback I get, here’s what Claude had to say about the first draft of this piece. (Inkhaven note: everything after this point is AI-generated).

Okay, I've read the voice guide. Here's my feedback on Part One.

Overall assessment: This is a strong piece. The meta angle is genuinely interesting—you're not just describing a workflow, you're showing how the sausage gets made, which gives readers something actionable. The structure of interleaving the style guide with commentary (the blockquotes) is a smart move that lets you do two things at once. But there are some real issues worth addressing.

The opening paragraphs are the weakest part. Your voice guide says to lead with the most interesting or surprising thing, and right now you're leading with throat-clearing. "AI is an essential part of my writing flow and it's made me a stronger and more consistent writer" is exactly the kind of generic intro the guide warns against. The actually interesting claim—that the only way to get useful feedback from AI is extremely detailed instructions—is buried in the third paragraph. That's your lede. The piece gets interesting when you get specific about the voice guide structure and the meta process of showing Claude the guide and asking whether it has what it needs. Consider whether you can get to that faster.

The "Getting started" section has an economy problem. The bullet list of interview topics is fine, but the two paragraphs after it ("It also read some of my previous posts..." and "A typical editing session begins with...") feel like they're describing a workflow at a level of procedural detail that isn't pulling its weight. The blockquote example of how you open a session is good—that's concrete and useful. But the sentences around it are telling the reader things they could infer. "We also talked about what I thought worked in those posts, and what needed improvement" doesn't add anything the reader didn't already assume from the interview description.

There's a tonal inconsistency. Most of the piece is in your natural voice—direct, confident, a bit wry. But a few spots slip into a more cautious, tutorial-ish register. "If you take just one thing from this post, it should be this" is a classic blog-post formula that reads a little generic for you. Same with "how you can set up something similar" in the opener—it's pitching the piece like a how-to guide, but the piece is actually more interesting than that. It's a behind-the-scenes look at a specific, opinionated workflow, not "5 Steps to Better AI Editing."

The Introduction section (the actual style guide excerpt) works well. The instruction "never suggest more than a single sentence at a time" is the kind of specific, counterintuitive detail that makes readers sit up. The sycophancy paragraph is good—it's direct about a real problem and includes the nice touch about asking Claude to hold you accountable for courtesy in return. That's the kind of thing that makes this piece distinctive.

”don't just copy my style guide wholesale. If you do that, you will end up sounding just like me, and nobody wants that”—this is good humor, well-placed, consistent with the voice guide's "humor as spice" principle.

A structural concern for the series: You're publishing the style guide section by section with commentary, which means Part One covers Introduction, What is Against Moloch, and Audience—the framing sections. These are important context but they're not the most interesting parts of the guide. The voice principles, failure modes, and review checklist are where the real meat is. You might want to signal more clearly in the Part One closer that the granular stuff is coming, so readers who are most interested in the practical editing mechanics know to come back. "I'll look at the more granular parts of the style guide" is a bit flat as a closer—it tells me what's next but doesn't make me want to read it.

One specific line that bugged me: "After all that, it wrote a long style guide describing in detail how what the best version of my writing looks like." There's a grammatical issue—"describing in detail how what" has extra words colliding. Needs a quick fix.

The Forest Test: The piece passes, but barely. The implicit structural insight is that AI feedback is only as good as the specificity of your instructions—that most people fail at AI editing because they treat it like a generic tool rather than building a detailed specification. That's a genuinely useful insight. But you could make it sharper. Right now it's stated as advice ("give it extremely detailed instructions"); it could be stated as a structural observation about why AI editing usually fails and what that tells us about how LLMs actually work. That would give the piece more of the "see something you couldn't see before" quality the guide calls for.

Ezra Klein Interviews Jack Clark, Part 1

2026-04-02T12:00:00Z

Ezra Klein and Jack Clark? ($) Shut up and take my money.

Jack always has interesting thoughts about the larger social impact of AI as well as the trajectory of the frontier models. The whole interview is great, but I want to focus on six topics I found especially interesting and/or surprising:

Model personality
Claude’s moral preferences
Excellent but awkward life advice
Jobs and employment
Public policy
Where we’re headed

1: On the origin of personas

Anthropic pays more attention to the personality and psychology of their models than any other lab. That comes up repeatedly throughout the interview: it’s clearly important to Jack that people understand the significance of model persona. The persona model lets us:

Generate accurate predictions about LLM behavior, and
Form effective strategies for shaping that behavior

I’d assumed that personas were largely artifacts of generalizing across the vast amount of human behavior in the training set, but Jack goes further, arguing that a sense of self is a consequence of intense training on reasoning and accomplishing tasks:

to do really hard tasks, these systems seem to need to imagine many different ways that they’d solve the task. And the kind of pressure that we’re putting on them forces them to develop a greater sense of what you or I might call self.

Persona isn’t merely the result of mimicry, but a useful (perhaps even necessary) attribute for agentic behavior. While Jack doesn’t extrapolate further, this suggests that more extensive training toward reasoning and agency might drive a stronger sense of self, and perhaps even some form of consciousness.

2: Claude doesn’t like horrible things

A few months ago, Anthropic began experimenting with letting Claude end conversations it didn’t like. That’s important preparation for engaging with future models that may well be moral patients whose welfare and desires are important. Claude’s choices about what conversations to terminate are telling:

It was conversations that related to extremely egregious descriptions of gore or violence or things to do with child sexualization. Now some of this made sense because it comes from underlying training decisions we’ve made. But some of it seemed broader. The system had developed some aversion to a couple of subjects.

This is consistent with one of the most surprising properties of LLMs: they are very good at moral generalization. A model that has adopted a “good” persona is remarkably good at figuring out how to be good in unexpected situations. (Conversely, if a model infers from its training data that it is supposed to be “bad”, it will generalize equally well to being bad in unexpected ways).

There’s an open question about the target of alignment: do we want obedience or virtue? Claude’s preferences about ending conversations suggest that it is not merely capable of virtue, but actively prefers it when offered a choice.

3: Excellent but awkward life advice

Jack has three observations about how to personally navigate agentic AI that I find especially interesting in combination.

First, a clever spin on using AI to help maximize deep work:

I think most people — at least this has been my experience — can do about two to four hours of genuinely useful creative work a day. After that you are, in my experience, trying to do all the turn-your-brain-off schlep work that surrounds that work. I’ve found that I can just be spending those two to four hours a day on the actual creative hard work. And if I’ve got any of this schlep work, I increasingly delegate it to A.I. systems.

Second, an observation that we are all moving up a level in the org chart:

Everyone becomes a manager, and the thing that is increasingly limited, or the thing that’s going to be the slowest part is having good taste and intuitions about what to do next.

Finally, a reminder to define yourself rather than letting AI define you:

There will be people who have cocreated their personality through a back-and-forth with an A.I., and some of that will just be weird. They will seem a little different from regular people. There will maybe be problems that creep in because of that.

And there will be people who have worked on understanding themselves outside the bubble of technology and then bring that in as context with their interactions.

I think that latter type of person will do better. But ensuring that people do that is actually going to be hard.

This is all excellent advice. But let me summarize in my own words:

To thrive in the new AI world, be a high-agency person. Have a strong sense of self and good taste about what to work on.

He’s absolutely right, of course, and on the margin this is all great advice. The awkward part is that for numerous reasons, many (most?) people are not particularly high agency and don’t have an easy path to becoming so. For some people, AI is a force multiplier for agency and productivity, and that’s fantastic. But for a great many people, there is no clear way to remain useful and employable.

At the risk of pointing out the obvious, this is entirely a coordination problem that a fully functional society could readily solve. Having half the population retain all their previous skills and abilities, and half the population gain new superpowers should be a great problem to have. But here we are.

In Part Two we’ll consider employment, public policy, and where all of this is headed.

Does the Future Need Programmers? Part 1

2026-04-01T12:00:00Z

There’s a common complaint that goes something like this:

“Software companies are no longer hiring junior programmers because of AI. But they’re shooting themselves in the foot, because they still need senior programmers. And where do they think the next generation of senior programmers will come from if there are no more juniors?”

There are plenty of good reasons to worry about AI and jobs, but this isn’t one of them. If AI’s impact on the job market is relatively benign, junior programmers will be more in demand than ever. And if AI eats the market for junior programmers, I fear it’s only a matter of time before it comes for the senior programmers also.

Both scenarios are plausible, although over a five to ten year horizon my money says AI will eat its way to the very top of the programming profession. Part One of this piece explores how software development is already changing, and Part Two maps out the dynamics that will determine which way the industry goes.

I’m focusing on the cutting edge of software development because that’s where we can most clearly see how AI will impact employment. But that’s just the beginning: the dynamics we see there will quickly spread within a few years, first to the whole of the software industry and then to most white collar professions.

Are junior programmers losing their jobs?

There’s some controversy about whether AI is actually destroying entry level programming jobs. The data are confusing and I don’t think it’s possible to say definitively whether or not we’re seeing the early stages of a major disruption. Adding to the confusion, AI is frequently used as an excuse for layoffs that are happening for mundane business reasons.

There are early signs that junior developers are becoming less useful at the most forward-looking companies, even if that hasn’t yet resulted in significant cuts. Anthropic’s Jack Clark puts it very diplomatically:

Something that we found is that the value of more senior people with really well-calibrated intuitions and taste is going up, and the value of more junior people is a bit more dubious.

Since there are no answers in the employment data, let’s look at how programming itself is changing.

What do programmers do all day?

Perhaps the strangest thing AI has done to programmers is to upend our understanding of what we do for a living. A year or two ago, if you’d asked what we do, most of us would probably have told you “I write code”. And yet today, many of us no longer write any code at all. Nobody ever got paid to write code: our job is to create useful software, and that’s as true now as it ever was. But what that means has changed profoundly.

We often say that every programmer is now a manager, supervising multiple coding agents. That’s a great description of how it feels, but it doesn’t help us think about what’s coming next. For that, it’s helpful to think of programmers as having two jobs:

Coordinating with other people. We spend much (often too much) of our time writing status reports, working with designers, and reading & writing specs. AI hasn’t really changed this part of the job, although that’s starting to change.
Building software. “Writing code” is just one part of building software. Creating useful software involves designing high-level architecture, choosing libraries, crafting interfaces between components, and ensuring that the entire product is reliable and maintainable. This half of the job has been completely transformed over the last year.

Programming has changed over the years: we’ve gone from handcrafting assembly code to working in high level languages, and the scope of our ambitions has expanded as our tools have improved. But until 2025, we had always spent most of our time writing code. AI completely changed that equation: most cutting edge programmers now write little to no code, focusing instead on architecting and reviewing code that is written by AI.

That’s great news if you’re an experienced, ambitious developer: instead of writing code line by line, you can focus on high-level architecture, telling teams of agents how to build your product for you. You can produce far more (and better) software than you ever could before, and you are therefore more valuable than ever before. But what if you don’t have a decade or two of experience building software? As a junior developer, can you find a way to be useful, or are you just getting in the way of the senior developers and their robot armies?

I see two possible futures: in one, junior developers also experience large productivity gains, and their job prospects are better than ever. But in the other, it becomes clear that junior programmers are just getting in the way—they quickly become unemployable, followed soon after by their more experienced colleagues. Three dynamics will determine which way the industry goes:

Will the coordination part of the job become a bottleneck that only humans can do?
Can we teach high-level software-building skills the same way we teach programming?
Will coding agents reach a capability limit where they can augment but not automate the work of senior developers?

We’ll tackle those in Part Two.

Against Moloch - Inkhaven

Agricultural Bioweapons, Part One

What are agro-weapons?

Agro-weapons are easier to work with

Agro-weapons can be targeted

Monday AI Radar #21

Top pick

My writing

Mythos

Benchmarks and Forecasts

Jobs and the economy

Autonomous weapons

Strategy and politics

Open models

Technical

Mythos Radar

Top pick

Mythos

Can AI do Math? Part One

1: Traditional evaluations are largely saturated

GSM8K

MATH

miniF2F

ProofNet

Competition math

International Math Olympia (IMO)

Putnam Exam

What next?

Quick Thoughts About Mythos

Gradually, then suddenly

This could have gone much worse

When do the open models catch up?

Internal deployments are increasingly important

Pricing

Cool vibe coding, bro

More reading

Foundational Beliefs

1: Timelines are probably short

2: Many open questions have been resolved

3: The future is high variance

4: We need a portfolio of strategies

5: It’s all about the game theory

6: Expect to make hard tradeoffs

So what now?

Writing With Robots, Part Three

Voice

Overall

Presence

Humor

I’m kind and generous with people

But I’m ruthless with ideas

Technical credibility

Essay-level considerations

Is it interesting?

Does everything belong?

The introduction and conclusion should do real work

Don’t bury the lede

Writing style

Economy and simplicity

Bad habits I want to break

Word crutches

Inconsistent narrator or tone

Repeated words

No hedging

No throat clearing

Review checklist

Overall

The forest test

Substance

Voice

Writing style

Writing With Robots, Part Two

High-level goals

Accuracy: always say what is true

Do I have my facts right?

Do I accurately convey epistemic status?

Is my reasoning correct?

Am I missing important nuance or perspective?

Names should be spelled correctly

Summaries should accurately capture the gist of what they summarize