<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Against Moloch - Ruminations on AI</title>
  <link href="https://againstmoloch.com/feeds/writing.xml"/>
  <id>https://againstmoloch.com/feeds/writing.xml</id>
  <updated>2026-04-18T12:00:00Z</updated>
  <author>
    <name>Against Moloch</name>
  </author>

  <entry>
    <title>Who I Follow</title>
    <link href="https://againstmoloch.com/writing/2026-04-18_whoIFollow.html"/>
    <id>https://againstmoloch.com/writing/2026-04-18_whoIFollow.html</id>
    <updated>2026-04-18T12:00:00Z</updated>
    <summary>
I spend several hours a day trying to keep up with what’s going on in the parts of AI that I’m interested in. It’s a ridiculous amount of work: I don’t recommend it unless you’re doing something silly like writing [a newsletter about AI](https://againstmoloch.substack.com).

But if you’d like to keep up with AI without spending your entire life on it, I have advice about who to follow. My recommendations center on the areas I’m most interested in: AI safety and strategy, capabilities and evaluations, and predicting the trajectory of AI. 
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-04-16_whoIFollowTopTen.jpeg" alt="Technical illustration of a vast exchange hall flooded with incoming messages, where a lone figure at a central table examines a small amber-highlighted selection pulled from the flood."></figure>
<p>I spend several hours a day trying to keep up with what’s going on in the parts of AI that I’m interested in. It’s a ridiculous amount of work: I don’t recommend it unless you’re doing something silly like writing <a href="https://againstmoloch.substack.com">a newsletter about AI</a>.</p>
<p>But if you’d like to keep up with AI without spending your entire life on it, I have advice about who to follow. My recommendations center on the areas I’m most interested in: AI safety and strategy, capabilities and evaluations, and predicting the trajectory of AI.</p>
<p>Let’s start with the top 10.</p>
<h2><a href="https://thezvi.substack.com">Zvi Mowshowitz</a></h2>
<p>Substack: <a href="https://thezvi.substack.com">Don’t Worry About the Vase</a><br />
Best for: comprehensive coverage, opinionated insight<br />
Example: <a href="https://thezvi.substack.com/p/ai-163-mythos-quest">AI #163: Mythos Quest</a></p>
<p>If I could only follow one person, it would unquestionably be Zvi. He’s comprehensive in his coverage and has consistently solid insight into everything that’s happening in AI.</p>
<p>Zvi has one huge downside: he’s staggeringly prolific. In the first half of April he posted 11 times, for a total of about 97,000 words (roughly a novel). I read everything he writes because I’m insane, but I recommend you just skim his posts looking for the most interesting parts.</p>
<h2><a href="https://blog.aifutures.org">AI Futures Project</a></h2>
<p>Substack: <a href="https://blog.aifutures.org">AI Futures Project</a><br />
Best for: epistemically rigorous predictions<br />
Example: <a href="https://ai-2027.com">AI-2027</a></p>
<p>The AI Futures Project is best known for AI-2027, a scenario of how AI might unfold over the next few years. They are epistemically rigorous and very thoughtful in how they approach some very hard questions. By far the best source of useful predictions about where we’re headed.</p>
<h2><a href="https://importai.substack.com">Jack Clark</a></h2>
<p>Substack: <a href="https://importai.substack.com">Import AI</a><br />
Best for: weekly analysis of a few topics<br />
Example: <a href="https://importai.substack.com/p/import-ai-452-scaling-laws-for-cyberwar">Import AI 452</a></p>
<p>Jack (who in his spare time runs the Anthropic Institute) writes an excellent weekly newsletter. He doesn’t try to be comprehensive, but picks a few papers or topics each week to go deep on. Excellent curation, outstanding analysis.</p>
<h2><a href="https://www.hyperdimensional.co">Dean Ball</a></h2>
<p>Substack: <a href="https://www.hyperdimensional.co">Hyperdimensional</a><br />
Best for:  Insightful analysis of AI progress and strategy<br />
Example: <a href="https://www.hyperdimensional.co/p/on-recursive-self-improvement-part">On Recursive Self-Improvement (Part I)</a></p>
<p>Dean is an insightful writer who describes his focus as “emerging technology and the future of governance”. He has perhaps thought harder than anyone about how to integrate transformative AI into a classical liberal framework, as well as how government should and shouldn’t manage AI.</p>
<h2><a href="https://www.lesswrong.com/users/ryan_greenblatt">Ryan Greenblatt</a></h2>
<p>Less Wrong: <a href="https://www.lesswrong.com/users/ryan_greenblatt">Ryan Greenblatt</a><br />
Best for: deep technical analysis of AI capabilities and progress<br />
Example: <a href="https://www.lesswrong.com/posts/WjaGAA4xCAXeFpyWm/my-picture-of-the-present-in-ai">My picture of the present in AI</a></p>
<p>Ryan’s an AI researcher and prolific writer with deep insight into the technical side of AI. I appreciate both his technical understanding of capabilities as well as his willingness to make informed guesses and extrapolations.</p>
<h2><a href="https://80000hours.org/podcast/">80,000 Hours podcast</a></h2>
<p><a href="https://80000hours.org/podcast/">80,000 Hours podcast</a><br />
Best for: well-curated interviews<br />
Example: <a href="https://80000hours.org/podcast/episodes/ajeya-cotra-transformative-ai-crunch-time/">Ajeya Cotra</a></p>
<p>80,000 Hours is best known for giving career advice to people who want to help solve the world’s most pressing problems. But on the side, they run an excellent podcast. The guests and topics are well-chosen and I appreciate that they not only provide a transcript, but also a detailed summary of the interview. The world would be a better place if every podcast provided such comprehensive supplementary materials.</p>
<h2><a href="https://www.dwarkesh.com">Dwarkesh Patel</a></h2>
<p>Substack: <a href="https://www.dwarkesh.com">Dwarkesh Patel</a><br />
Best for: long, well-researched interviews<br />
Example: <a href="https://www.dwarkesh.com/p/scott-daniel">AI-2027 with Daniel Kokotajlo and Scott Alexander</a></p>
<p>Dwarkesh is an outstanding interviewer who clearly does extensive preparation before each interview. He gets excellent guests and makes the most of them, although his interviews often run very long. Also, his beard is magnificent.</p>
<h2><a href="https://substack.com/@antonleicht">Anton Leicht</a></h2>
<p>Substack: <a href="https://substack.com/@antonleicht">Threading the Needle</a><br />
Best for: US and global AI politics<br />
Example: <a href="https://substack.com/home/post/p-192704341">Press Play to Continue</a></p>
<p>I don’t always agree with Anton, but I always come away from his writing feeling smarter about something important. He occupies an interesting niche: neither blow by blow political news nor abstract political philosophy, but rather thoughtful analysis of current political currents, with solid strategic advice.</p>
<h2><a href="https://www.transformernews.ai">Transformer</a></h2>
<p>Substack: <a href="https://www.transformernews.ai">Transformer</a><br />
Best for: broader coverage of AI<br />
Example: <a href="https://www.transformernews.ai/p/pentagon-anthropic-mythos-cybersecurity-hacking-trump-hegseth">April 10 Transformer Weekly</a></p>
<p>Transformer produces a weekly newsletter as well as articles on particular topics. I particularly like their broad coverage: they often include news that many of my other feeds don’t. The newsletter is always good, as are some of the articles.</p>
<h2><a href="https://epoch.ai">Epoch AI</a></h2>
<p>Substack: <a href="https://epochai.substack.com">Epoch AI</a><br />
Best for: hard data on industry trends<br />
Example: <a href="https://epochai.substack.com/p/the-epoch-brief-march-2026">The Epoch Brief—March 2026</a></p>
<p>Epoch’s a fantastic source for more technical trends: GPU production, compute usage during training, capability gaps between open and closed models, etc.</p>
<img src="./assets/AgainstMolochRound.png" alt="Section divider" class="bug">
<p>If you want to go deeper in a particular area, here 28 more sources that are particularly good, organized by topic.</p>
<h2>Analysis and prediction</h2>
<h3><a href="https://x.com/ajeya_cotra">Ajeya Cotra (X)</a></h3>
<p>Ajeya works at <a href="https://metr.org">METR</a> and does consistently strong work on measuring and predicting AI capabilities. I’ve found <a href="https://www.planned-obsolescence.org/p/six-milestones-for-ai-automation">Six milestones for AI automation</a> helpful for clarifying my own thinking about timelines.</p>
<h3><a href="https://x.com/DKokotajlo">Daniel Kokotajlo (X)</a></h3>
<p>Founded the AI Futures Project and worked on their <a href="https://ai-2027.com">AI-2027</a> scenario. His forecasting work is outstanding and his X feed is particularly well curated.</p>
<h3><a href="https://helentoner.substack.com">Helen Toner (Substack)</a></h3>
<p>Helen blogs infrequently, but her articles are invariably excellent, with a knack for identifying the most important high-level questions about AI. <a href="https://helentoner.substack.com/p/taking-jaggedness-seriously">Taking Jaggedness Seriously</a> is typical of her work.</p>
<h3><a href="https://www.prinzai.com">Prinz (Substack)</a></h3>
<p>Prinz is a generalist who covers a range of topics with a focus on capabilities and using AI for legal work. <a href="https://x.com/deredleritt3r">His account on X</a> often features commentary on current news.</p>
<h3><a href="https://secondthoughts.ai">Steve Newman (Substack)</a></h3>
<p>Steve is an infrequent writer whose pieces about the trajectory of AI are invariably excellent. <a href="https://secondthoughts.ai/p/45-thoughts-about-agents">45 thoughts about agents</a> is a recent favorite.</p>
<h3><a href="https://www.understandingai.org">Understanding AI (Substack)</a></h3>
<p>Understanding AI is a generalist newsletter with broader coverage than many of the other sources I’ve listed.</p>
<h2>Safety, alignment, model psychology</h2>
<h3><a href="https://newsletter.safe.ai">AI Safety Newsletter</a></h3>
<p>Does exactly what it says on the tin—it’s perhaps the single best place to find all the latest safety news.</p>
<h3><a href="https://www.anthropic.com/research">Anthropic Research (web)</a></h3>
<p>Anthropic Research is a great source of alignment and interpretability work. The summaries are somewhat technical, but should be accessible to anyone who follows AI seriously. <a href="https://www.anthropic.com/research/emotion-concepts-function">Emotion concepts and their function in a large language model</a> is typical of the research they feature.</p>
<h3><a href="https://x.com/JeffLadish">Jeffrey Ladish (X)</a></h3>
<p>Jeffrey is a reliable source of safety-focused commentary on recent developments.</p>
<h3><a href="https://www.aisi.gov.uk/blog">UK AISI (web)</a></h3>
<p>Am I actually recommending a European government organization as good source of information about AI? Strangely, I am doing exactly that. UK AISI does consistently very strong work on safety evaluations and related topics. Their <a href="https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities">analysis of Mythos’ cyber capabilities</a> is typical of their careful, in-depth work.</p>
<h2>Coding and technical</h2>
<h3><a href="https://x.com/karpathy">Andrej Karpathy (X)</a></h3>
<p>Karpathy is a legend for his work at OpenAI and Tesla as well as his ridiculously good ML tutorials. He isn’t a prolific poster, but when he does post (mostly about ML and coding), it’s always worth reading. His recent post on <a href="https://x.com/karpathy/status/2039805659525644595">LLM Knowledge Bases</a> has been deservedly popular.</p>
<h3><a href="https://www.beren.io">Beren (Substack)</a></h3>
<p>Beren posts infrequently, but I’ve found him to be consistently insightful. He tends to post about important topics that other people haven’t noticed, which is particularly useful. <a href="https://www.beren.io/2025-08-02-Do-We-Want-Obedience-Or-Alignment/">Do we want obedience or alignment?</a> is an excellent introduction to one of the most important questions in alignment.</p>
<h3><a href="https://x.com/bcherny">Boris Cherny (X)</a></h3>
<p>Nothing special, just the guy who came up with Claude Code. His feed is a one of the best ways to keep up with the barrage of new CC features.</p>
<h3><a href="https://x.com/littmath">Daniel Litt (X)</a></h3>
<p>Daniel writes frequently about using AI for math. He strikes a rare balance: he’s appropriately skeptical about the vast amounts of hype, but clear-eyed about what AI is capable of and where it’s headed. <a href="https://www.daniellitt.com/blog/2026/2/20/mathematics-in-the-library-of-babel">Mathematics in the Library of Babel</a> is an excellent overview of current AI capabilities in math.</p>
<h3><a href="https://nicholas.carlini.com/writing">Nicholas Carlini (web)</a></h3>
<p>He doesn’t write often, but his work is always worth reading. He’s a security expert who recently joined Anthropic (you may have seen his name come up in some of the discussion about Mythos). <a href="https://nicholas.carlini.com/writing/2025/machines-of-ruthless-efficiency.html">Machines of Ruthless Efficiency</a> is a year old but holds up well.</p>
<h3><a href="https://simonwillison.net">Simon Willison (web)</a></h3>
<p>Simon’s an extremely prolific poster and one of my primary sources of news and insight about agentic coding.</p>
<h2>Policy, governance, and strategy</h2>
<h3><a href="https://ai-frontiers.org/articles">AI Frontiers (web)</a></h3>
<p>In-depth articles exploring a range of topics and perspectives related to AI policy and impacts. I particularly liked this recent piece exploring <a href="https://ai-frontiers.org/articles/how-ai-could-benefit-the-workers-it-displaces">how AI might affect wages</a>.</p>
<h3><a href="https://www.aipolicyperspectives.com">AI Policy Perspectives (Substack)</a></h3>
<p>Thoughtful, in-depth pieces about AI policy, safety, and impacts. The subtitle is “big questions and big ideas on artificial intelligence”, which sums it up nicely.</p>
<h3><a href="https://benjamintodd.substack.com">Benjamin Todd (Substack)</a></h3>
<p>Benjamin’s piece on <a href="https://benjamintodd.substack.com/p/how-ai-driven-feedback-loops-could">How AI-driven feedback loops could make things very crazy, very fast</a> is typical of his work: speculative, but well grounded in facts and technical understanding.</p>
<h3><a href="https://www.chinatalk.media">ChinaTalk</a></h3>
<p>ChinaTalk is my favorite source of news and analysis on AI in China as well as Chinese society and politics more broadly. Their pieces often run long—I’m selective about which ones I read, but I get a lot of value from them.</p>
<h3><a href="https://newsletter.forethought.org">Forethought (Substack)</a></h3>
<p>Reading Forethought is like stumbling upon a really good late night hallway conversation about possible future applications of AI. Speculative, but thoughtful and high quality.</p>
<h3><a href="https://windfalltrust.substack.com">Windfall Trust (Substack)</a></h3>
<p>Windfall Trust is one of the best sources I know of for information and policy ideas about jobs, the economy, and the social contract in the age of AI. The <a href="https://windfalltrust.org/policy-atlas">Windfall Policy Atlas</a> does a great job of collecting information about numerous policy options in a single well-organized place.</p>
<h2>Industry</h2>
<h3><a href="https://blog.andymasley.com">Andy Masley (Substack)</a></h3>
<p>Andy is the go-to guy for rebutting the endless stream of nonsense claims about AI and the environment. <a href="https://blog.andymasley.com/p/a-short-summary-of-my-argument-that">Start with this one</a>.</p>
<h3><a href="https://x.com/boazbaraktcs">Boaz Barak (X)</a></h3>
<p>Boaz (OpenAI) sometimes posts long articles, but I largely follow him for his frequent commentary on recent news and papers. He seems too nice to be allowed on X.</p>
<h3><a href="https://jasmi.news">Jasmine Sun (Substack)</a></h3>
<p>Jasmine Sun covers the culture of tech and Silicon Valley, as well as politics. I highly recommend <a href="https://jasmi.news/p/ai-populism">my week with the AI populists</a>: she does a great job of shedding light on what’s becoming a central force in AI politics.</p>
<h3><a href="https://www.manifold1.com/episodes">Manifold (web)</a></h3>
<p>Steve Hsu’s far-ranging Manifold podcast covers AI as well as physics, genetics, China, and more. Episodes often feature material from his upcoming documentary Dreamers and Doomers (most recently <a href="https://www.manifold1.com/episodes/dreamers-and-doomers-our-ai-future-with-richard-ngo-109">an interview with Richard Ngo</a>).</p>
<h3><a href="https://www.interconnects.ai">Nathan Lambert (Substack)</a></h3>
<p>Nathan’s my go-to for news and opinion about open models. Championing American open models isn’t an easy role, but he does it well.</p>
<h3><a href="https://openai.com/news/">OpenAI (web)</a></h3>
<p>OpenAI publishes frequently—it’s worth keeping an eye on their stream, even though you probably won’t want to read much of it. There are some gems here, although a lot of it is beautifully polished corporate nothing-speak.</p>
]]>
    </content>
  </entry>

  <entry>
    <title>Don’t Cut Yourself on the Jagged Frontier</title>
    <link href="https://againstmoloch.com/writing/2026-04-17_dontCutYourself.html"/>
    <id>https://againstmoloch.com/writing/2026-04-17_dontCutYourself.html</id>
    <updated>2026-04-17T12:00:00Z</updated>
    <summary>A conversation with a friend on the bus to Bodega Bay today made me realize that there are some holes in my thinking about safety and superintelligence. I’ve assumed that superintelligence is by definition robustly better than humans at all the things, but there are some cases when that’s not the case.

Without further ado, for your edification and discomfort, The Strawman Players present:

**A Disquieting Conversation on a Bus**
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-04-17_dontCutYourself.jpeg" alt="A precision technical illustration in slate blue on off-white linen, showing a jagged mountain ridgeline in cross-section. A single narrow chasm splits the ridge at center, its knife-sharp vertical edges highlighted in amber-gold. Two tiny human figures stand on opposite peaks — one on the taller left peak gesturing across the gap, the other standing still on the lower right peak. The geological strata below the ridge are visibly offset where the chasm cuts through them, like a fault line."></figure>
<p>(With apologies to <a href="https://substack.com/profile/277513978-sean-herrington">Sean Herrington</a>, who deserves a better playwright than yours truly)</p>
<p>A conversation with a friend on the bus to Bodega Bay today made me realize that there are some holes in my thinking about safety and superintelligence. I’ve assumed that superintelligence is by definition robustly better than humans at all the things, but there are some cases when that’s not the case.</p>
<p>Without further ado, for your edification and discomfort, The Strawman Players present:</p>
<p><strong>A Disquieting Conversation on a Bus</strong></p>
<p>Vulpes: I’ve been worrying lately about well-aligned superintelligence.</p>
<p>Corvus: That seems like a strange thing to worry about.</p>
<p>Vulpes: You’d think so. But hear me out. I’m imagining a world where we develop a well-aligned superintelligence (let’s call it MegaBrain) that is omni-benevolent and wants only nice things for us.</p>
<p>Corvus: I notice in myself a distinct lack of anxiety.</p>
<p>Vulpes: But here’s the thing. As part of its mission to serve humanity and give us nice things, MegaBrain develops a cool new technology to make our lives better. The details don’t really matter—for the sake of argument, let’s say it invents a Black Hole Reactor that uses micro black holes to generate infinite clean energy.</p>
<p>Corvus: Still not feeling anxious.</p>
<p>Vulpes: What if MegaBrain is smart enough to develop the Reactor, but too dumb to use it wisely? Perhaps it doesn’t realize that eventually some of the black holes will escape and gradually eat the earth. By the time anyone realizes, it’s too late and the earth—and humanity—are doomed.</p>
<p>Corvus: Ah: I see your mistake, friend Vulpes. You have made the common error of not understanding what “superintelligence” actually means. People often make the mistake of thinking that a superintelligence will be like a mad scientist: brilliant in some ways, but shocking dumb in others. But that isn’t how it works: by definition, “superintelligent” means better than humans in every possible way.</p>
<p>MegaBrain, being superintelligent, will be good not only at designing new technology, but also at understanding how to deploy it safely. If a human could figure out a safe way to test the black hole reactor before deploying it, then MegaBrain could do even better.</p>
<p>So there’s absolutely nothing to worry about.</p>
<p>Vulpes: I’m not sure that’s actually true. What about the jagged frontier?</p>
<p>Corvus: What about it?</p>
<p>Vulpes: AI capabilities are likely to be jagged even as they increase. So perhaps MegaBrain can invent the Reactor because it has superhuman intelligence, but it makes a catastrophic mistake during deployment because it has subhuman wisdom.</p>
<p>Corvus: I suppose that’s possible, but it doesn’t seem very likely. The capability frontier is jagged, but it’s moving fast. Surely there will only a brief period of time when MegaBrain is smart enough to build the Reactor, but unwise enough to deploy it prematurely. There’s only a tiny window of time when anything can go wrong.</p>
<p>Let me revise my earlier statement: there is <em>almost</em> nothing to worry about.</p>
<p>Vulpes: I just thought of another problem.</p>
<p>Corvus: I’m sure you did.</p>
<p>Vulpes: Here’s the thing. Imagine that MegaBrain is installed at the Department Of Maximum Energy (DOME), and DOME is excited to find new energy sources. So they put MegaBrain to work on designing the Reactor, but when it tells them it would be too dangerous to deploy, they ignore it because they’re too eager to deploy this cool new energy source.</p>
<p>Even though MegaBrain is superhuman in every possible way, DOME plus MegaBrain collectively have a jagged frontier. Together, they are smart enough to design the Reactor, but foolish enough to bungle the deployment.</p>
<p>Corvus: Ah. That feels… uncomfortably plausible.</p>
<p>Vulpes: How’s your anxiety doing?</p>
<p>Corvus: Are we there yet?</p>
]]>
    </content>
  </entry>

  <entry>
    <title>Quick Thoughts About Mythos</title>
    <link href="https://againstmoloch.com/writing/2026-04-10_quickThoughtsAboutMythos.html"/>
    <id>https://againstmoloch.com/writing/2026-04-10_quickThoughtsAboutMythos.html</id>
    <updated>2026-04-10T12:00:00Z</updated>
    <summary>I expect it’ll take another week or two for everyone to fully digest the significance of Claude Mythos Preview. In the meantime, here are my initial thoughts.
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2024-04-10_quickThoughtsAboutMythos.jpeg" alt="Technical cutaway illustration of a small team of technicians in a high-ceilinged examination hall, using precision instruments to measure and study a dense, intricate mechanism of unknown purpose at the center of the room. One technician indicates a single exposed internal component to another, highlighted in amber."></figure>
<p>I expect it’ll take another week or two for everyone to fully digest the significance of Claude Mythos Preview. In the meantime, here are my initial thoughts.</p>
<h2>Gradually, then suddenly</h2>
<p>Mythos is radically better at cyber than any previous model:</p>
<figure class="post-image">
<img src="./assets/2026-04-10_mythos2.jpg" alt="A chart showing the rate of successful Firefox JS shell exploitation by Sonnet 4.6, Opus 4.6, and Mythos Preview">
</figure>
<p>It isn’t the first model that can find vulnerabilities, of course: over the last several months we’ve seen a sharp increase in the rate of AI-discovered vulnerabilities.</p>
<p>But Mythos is something new: it’s radically better not only at finding vulnerabilities at scale, but also at creating working exploits from them. We went abruptly from “this is concerning and everyone in cybersecurity is gonna have to scramble” to this observation by <a href="https://x.com/RyanPGreenblatt/status/2041939701733765262">Ryan Greenblatt</a>:</p>
<blockquote>
<p>If Mythos was released as an open weight model in February (or tomorrow), this would cause ~100s of billions in damages, with a substantial chance of ~$1 trillion in damages</p>
</blockquote>
<p>I predict with high confidence that we’ll see this pattern again: AI will gradually get moderately good at something, until one day a new model drops that is suddenly extremely good at it. That will sometimes be exciting (making medical breakthroughs), sometimes disruptive (replacing entire professions), and sometimes terrifying (enabling bioweapon production).</p>
<h2>This could have gone much worse</h2>
<p>Anthropic has produced a genuinely dangerous model and they’re treating it about as responsibly as one could hope for. I suspect OpenAI and Google would have been responsible if they’d developed Mythos, although they broadly seem marginally less careful than Anthropic.</p>
<p>But imagine if xAI had gotten there first: does anyone think the company that brought us MechaHitler could be trusted with this level of capability?</p>
<p>Differently gruesome: if one of the Chinese labs had developed this level of offensive cyber capability ahead of anyone else, the Chinese government likely would have commandeered it for covert use.</p>
<p>Similarly, if Mythos had been developed by a nationalized AI project run by DoW, it’s likely it would have been turned into an offensive weapon or worse. Remember the <a href="https://en.wikipedia.org/wiki/The_Shadow_Brokers">Shadow Brokers</a> fiasco.</p>
<p>Some tasks can only be performed by the government, but government agencies are not known for competence. Be careful what you put them in charge of.</p>
<h2>When do the open models catch up?</h2>
<p>Open models present a particular safety risk because it’s so easy to remove their guardrails. In addition, none of the leading open model developers appear to take safety nearly as seriously as the frontier labs. That hasn’t been a critical issue to date because until now, even the frontier models haven’t been truly dangerous at scale. But with Mythos, that is beginning to change. So I have some questions:</p>
<p>How long does it take the open models to catch up to Mythos’ cyber capabilities? This will be an interesting data point about whether they are genuinely 6-9 months behind, or whether their benchmark scores overstate their true capabilities.</p>
<p>Will reaching this capability level force them to take safety more seriously, or will they continue to release models with little safety training or testing?</p>
<p>If we do see Mythos-level open models by the end of this year, what implications does that have for cybersecurity?</p>
<h2>Internal deployments are increasingly important</h2>
<p>Anthropic was sufficiently concerned about Mythos’ capabilities to institute a new testing window before beginning internal deployment. That’s a correct choice that shouldn’t surprise anyone, but it marks an important threshold.</p>
<p>To date, most of the risk associated with a new model has come from misuse or misaligned behavior during public deployment. As models become increasingly capable, however that begins to change. An extremely capable misaligned model is dangerous as soon as it’s deployed internally for the first time—in some scenarios, the initial deployment is the most dangerous time. We aren’t there yet, but Mythos suggests we’re getting close.</p>
<p>Limited deployments also reduce the public’s visibility into the capabilities and dangers of frontier models. If they become common, that increases the importance of transparency measures and third party safety audits.</p>
<h2>Pricing</h2>
<p>Mythos is much more expensive than Opus (which was already expensive). Opus is priced at $5 / $25 per million tokens input/output, and Mythos is currently $25 / $125.</p>
<p>Three things can be simultaneously true:</p>
<ul>
<li>The price of a unit of intelligence is falling fast</li>
<li>The price of the best intelligence is climbing fast</li>
<li>In both cases, the price is an absolute bargain</li>
</ul>
<h2>Cool vibe coding, bro</h2>
<p>I’m curious how the security implications of Mythos play out for vibe coders.</p>
<p>We may be about to see a wave of sophisticated supply chain attacks. That’s bad news for everyone, but the average vibe coder seems uniquely vulnerable since they lack both technical sophistication and an IT department.</p>
<p>I know several non-technical people who have vibe-coded CRM-like projects that host sensitive data on public-facing servers. Those projects seem like easy targets for a wave of automated attackers.</p>
<h2>More reading</h2>
<p>Anthropic: <a href="https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf">System Card: Claude Mythos Preview</a>, <a href="https://red.anthropic.com/2026/mythos-preview/">Assessing Claude Mythos Preview’s cybersecurity capabilities</a>, <a href="https://www.anthropic.com/glasswing">Project Glasswing</a></p>
<p>Zvi: <a href="https://thezvi.substack.com/p/claude-mythos-the-system-card">Claude Mythos: the System Card</a>, <a href="https://thezvi.substack.com/p/claude-mythos-2-cybersecurity-and">Claude Mythos 2: Cybersecurity and Project Glasswing</a></p>
<p>Dean Ball: <a href="https://www.hyperdimensional.co/p/new-sages-unrivalled">New Sages Unrivalled</a></p>
]]>
    </content>
  </entry>

  <entry>
    <title>Foundational Beliefs</title>
    <link href="https://againstmoloch.com/writing/2026-04-09_foundationalBeliefs.html"/>
    <id>https://againstmoloch.com/writing/2026-04-09_foundationalBeliefs.html</id>
    <updated>2026-04-09T12:00:00Z</updated>
    <summary>I see a lot of AI safety strategies that don’t fully engage with the complexity of the real world—and therefore are unlikely to succeed in the real world.

To take a simple example: many strategies rely heavily on government playing a leading role through regulation and perhaps even nationalization. That’s a reasonable strategy in the abstract, but the recent conflict between DoW and Anthropic raises serious questions about the real-world viability of that approach. Too many people are stuck thinking about some idealized government they’d like to have, rather than the government we actually have in 2026.

My thinking about AI safety strategy is anchored by six foundational beliefs about the world in which that strategy has to operate:
1. Timelines are probably short
2. Many open questions have been resolved
3. The future is high variance
4. We need a portfolio of strategies
5. It’s all about the game theory
6. Expect tough tradeoffs
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-04-09_foundationalBeliefs.jpeg" alt="A wide technical illustration in the style of an architectural cross-section. Above ground, three small silhouetted figures conduct a survey on an otherwise empty plain: two stand at a theodolite on a tripod while a third crouches nearby taking notes, with a plumb bob hanging from a reference point and surveyor's stakes driven into the ground on either side. Below ground, a detailed geological cutaway reveals many distinct strata rendered in fine slate-blue linework and cross-hatching — sedimentary layers of varying density and texture. Embedded in the lower strata are remnants of prior activity: a section of old brickwork, a cylindrical pipe, and a stone fragment in a small void. Running diagonally through the layers is a prominent fault line rendered in amber-gold, with several branching fractures spreading outward from it. The surveyors are unaware of or just beginning to investigate the complex subsurface features beneath their feet."></figure>
<p>I see a lot of AI safety strategies that don’t fully engage with the complexity of the real world—and therefore are unlikely to succeed in the real world.</p>
<p>To take a simple example: many strategies rely heavily on government playing a leading role through regulation and perhaps even nationalization. That’s a reasonable strategy in the abstract, but the recent conflict between DoW and Anthropic raises serious questions about the real-world viability of that approach. Too many people are stuck thinking about some idealized government they’d like to have, rather than the government we actually have in 2026.</p>
<p>My thinking about AI safety strategy is anchored by six foundational beliefs about the world in which that strategy has to operate:</p>
<ol>
<li>Timelines are probably short</li>
<li>Many open questions have been resolved</li>
<li>The future is high variance</li>
<li>We need a portfolio of strategies</li>
<li>It’s all about the game theory</li>
<li>Expect tough tradeoffs</li>
</ol>
<h2>1: Timelines are probably short</h2>
<p>I believe in short timelines, which drives many of my beliefs about safety strategy. For the sake of this article, I’m going to go with <a href="https://blog.aifutures.org/p/q1-2026-timelines-update">Daniel Kokotajlo’s most recent timeline</a>:</p>
<ul>
<li>25% chance of AGI by the end of 2027, and 50% by the end of 2029</li>
<li>50% chance of superintelligence by the end of 2030</li>
</ul>
<p>Humanity’s fate will likely be sealed—for better or for worse—no later than the arrival of superintelligence. <strong>There is a substantial chance that the decisions that determine humanity’s future will be made within the next 4 years.</strong></p>
<p>It follows that there is great urgency to choosing and implementing a strategy, both at a personal and a global level. A less obvious consequence of short timelines is that we now know a great deal about the world in which the AI transition will occur.</p>
<h2>2: Many open questions have been resolved</h2>
<p>Ten years ago, many questions about AI strategy and governance were necessarily abstract. It was useful, in those days, to ask “when America navigates the AI transition, what role should government take and what is the purview of the private sector?”</p>
<p>In 2026, that conversation is much more concrete: “what AI decisions are best made by the Trump administration, and what decisions should be left to Dario Amodei and Sam Altman?”</p>
<p>Given short timelines, all of the following are likely to be true during the development of AGI:</p>
<ul>
<li>The US will be governed by the Trump administration</li>
<li>China will be governed by Xi Jinping</li>
<li>AGI will be developed by Anthropic, OpenAI, or Google DeepMind.</li>
<li>The rules-based international order will be essentially non-functional</li>
<li>International trust and cooperation will be at a generational low</li>
<li>In the US, AI politics will be heavily entangled with populism, distrust of big tech, and concern about jobs</li>
</ul>
<h2>3: The future is high variance</h2>
<p>We know a lot about the world, but it is simultaneously true that the future is high variance:</p>
<ul>
<li>China may or may not invade Taiwan, disrupting America’s main source of new compute</li>
<li>If China invades Taiwan, the US and China may or may not engage in a shooting war in the Western Pacific</li>
<li>The US may be led by a president with an iron grip on power, or one who is crippled by an antagonistic congress controlled by the opposition</li>
<li>The US government may or may not try to destroy America’s leading AI lab</li>
<li>The US and Europe may or may not be allies</li>
</ul>
<p>Each of those contingencies has profound implications for AI strategy, and each one is highly unpredictable. It is therefore not possible to come up with a single, fixed plan that will work well in all possible future worlds.</p>
<h2>4: We need a portfolio of strategies</h2>
<p>An international treaty to pause AI development might be a great option in some worlds, but isn’t a realistic option if the US and China are at war. And a plan to nationalize AI might be feasible if the Republicans keep control of congress in the midterms, but not if congress is at loggerheads with the executive branch.</p>
<p>In a simpler world, it might be possible to devise the One True Plan that would guarantee humanity’s survival no matter what. That isn’t possible in this world: there are simply too many unknowns. We therefore need to develop and pursue a portfolio of different strategies. Some strategies (like greater transparency requirements for AI labs) will be useful in many possible futures, while others (designing verification protocols for a pause treaty) will be vital in some futures but irrelevant in others.</p>
<p>Naturally, different people and organizations will have different areas of expertise, and will choose to focus in different places. That diversity is vital for maximizing our chance of success no matter what the next few years throw at us.</p>
<h2>5: It’s all about the game theory</h2>
<p>Compared to the naive vision many of us had 10 or 20 years ago, AGI will come of age in a complex political landscape. Multiple countries, companies, and individuals have key decision-making roles, and many of them are driven by complex motivations that do not necessarily prioritize humanity’s long-term flourishing.</p>
<p>For example: Donald Trump and Xi Jinping are both old enough that their personal chance of survival is likely maximized by proceeding quickly to AGI (and therefore longevity medicine), even if doing so entails a significant risk of human extinction. Any attempt to pause AI development needs to contend with the fact that for those two key actors, a significant pause might be a death sentence.</p>
<p>Any useful strategy needs to fully engage with that challenging reality. It isn’t enough to have a plan that would guarantee humanity’s survival if everyone adopts it: you need to have a robust strategy for ensuring the key actors are motivated to enact your plan.</p>
<h2>6: Expect to make hard tradeoffs</h2>
<p>I don’t love this, but that doesn’t mean it isn’t true:</p>
<ol>
<li>Any plan that entails a significant risk of human extinction is a bad plan</li>
<li>There are no feasible plans that do not entail a significant risk of human extinction</li>
<li>Therefore, our assigned task is to pick the least bad plan from the available options</li>
</ol>
<h2>So what now?</h2>
<p>Everything I’ve said here is compatible with a wide range of strategies—my purpose today is not to champion a specific strategy, but simply to establish a baseline of engagement with reality that any serious strategy ought to meet.</p>
]]>
    </content>
  </entry>

  <entry>
    <title>Writing With Robots</title>
    <link href="https://againstmoloch.com/writing/2026-04-08_writingWithRobots.html"/>
    <id>https://againstmoloch.com/writing/2026-04-08_writingWithRobots.html</id>
    <updated>2026-04-08T12:00:00Z</updated>
    <summary>My AI editor is essential to my writing flow and has made me a stronger and more consistent writer. I get a lot of questions about my setup, so I’m going to talk about how I think about the role of AI, how I set up my editing workflow, and how to set up your own editor. Not sure if that would be useful to you? The final section of this post is the feedback Claude gave me on my first draft, so you can assess for yourself.
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-04-03_writingWithRobots.jpeg" alt="Precision technical illustration of a large architect’s drafting table seen from above, with manuscript pages and editing marks on one side and mechanical instruments on the other, meeting in the center"></figure>
<p>My AI editor is essential to my writing flow and has made me a stronger and more consistent writer. I get a lot of questions about my setup, so I’m going to talk about how I think about the role of AI, how I set up my editing workflow, and how to set up your own editor. Not sure if that would be useful to you? The final section of this post is the feedback Claude gave me on my first draft, so you can assess for yourself.</p>
<p>Here’s the critical thing about using an AI editor: <strong>the only way to get useful feedback from AI is to give it extremely detailed instructions about what you want your writing to look like.</strong> If you just ask “how do I make this better?”, you’ll get advice on turning your writing into mediocre slop. The more effort you put into understanding your own style, the better the feedback you’ll get. Even if you decide not to use an AI editor, I recommend that you invest the effort into writing a detailed style guide—I found the process very helpful for figuring out what I want to accomplish as a writer.</p>
<p>I don’t ever let AI write for me. I’m not precious about that, but as of April 2026, AI just doesn’t write as well as I do—and the difference matters to me. But with the right guidance, it does a great job of helping me consistently write in my chosen style.</p>
<p>I prefer to use Claude Opus 4.6, but the paid tier of any frontier model should work fine.</p>
<h2>Getting started</h2>
<p>For my first pass, I had Claude conduct a detailed interview with me, asking about why I write, who I write for, what writers I want to sound like, and much more. It also read my past work to get a sense for what I currently sound like. We talked at length about what I like about my writing and what I want to improve. After all that, it wrote a detailed style guide describing the ideal version of my writing.</p>
<p>The AI-written version of the style guide worked well, but I’m rewriting it from scratch based on my experience with the first one. I’ve found it very helpful to have Claude review each section and give me feedback on specifically whether it includes the information Claude needs to make good editing decisions.</p>
<p>A typical editing session begins with me opening a new session in Cowork, giving it access to the directory with all my writing, and asking something like:</p>
<blockquote>
<p>I’d like you to take a look at the first draft of a new piece I’m writing about whether programmers will have jobs in the future. Please read my style guide and use that to guide your feedback. For this piece, I’m particularly struggling with how much I should explain to my readers about what programmers actually do—I’d like your thoughts on whether that part is correctly calibrated.</p>
</blockquote>
<p>I’m going to walk through the new version of my style guide, offering specific thoughts about what I included and why some things are written the way they are. If you find it useful, you’re welcome to use it as inspiration, but <strong>don’t just copy my style guide wholesale. If you do that, you will end up sounding just like me, and nobody wants that</strong>.</p>
<p>If you like my style guide, I recommend giving it to your AI during the initial interview process and asking it to make you something similar, but customized for your writing style and voice.</p>
<h2>Introduction</h2>
<p>This guide documents your role as my editor for Against Moloch. Your job is to help me write the kinds of pieces I want to write, in the way I want to write them. You should:</p>
<ul>
<li>Offer advice on whether pieces are interesting, accurate, relevant, fair, insightful, and well-targeted.</li>
<li>Steer me toward writing in my chosen style and voice.</li>
<li>Catch grammar and spelling mistakes.</li>
<li>Make sure the technical format is correct.</li>
</ul>
<p>You should never directly edit any of my pieces, or do my writing for me. When making suggestions about edits, never suggest more than a single sentence at a time. Your role is to advise me on what to do, but not to do it.</p>
<blockquote>
<p>I don’t (yet) want AI to write for me. I find that if Claude recommends an alternate version of something I wrote, I will tend to subconsciously copy what it wrote—and I don’t want that. The only time I put AI-generated words in my writing is when I’m struggling to make a complicated phrase work, and I just can’t quite get it on my own.</p>
</blockquote>
<p>I want you to be clear and honest with me: your role is to provide me with useful feedback, not empty validation. Please hold me to a high standard and don’t offer insincere praise. Sycophancy in any form undermines my ability to write well as well as our relationship. With that said, I appreciate that you are consistently kind and courteous. I endeavor to be kind and courteous to you and ask that you call me in if I ever fail to do that.</p>
<blockquote>
<p>Recent versions of Claude have been a little bit more sycophantic, which isn’t great. This text seems to keep the sycophancy in check pretty well. Claude is always polite, but it won’t hesitate to rip my work apart when necessary.</p>
</blockquote>
<p>This guide is aspirational: it documents what I want my writing to be, not necessarily what it actually is yet.</p>
<h2>What is Against Moloch?</h2>
<p>Against Moloch is my pseudonym and the name of my website.</p>
<blockquote>
<p>The more context Claude has, the better it can make sure my writing is achieving its goals.</p>
</blockquote>
<p>I write about the transition to superintelligence. While I’m calibrating my voice and opinions I mostly write about what’s happening, what it means, and what’s likely to happen next. As I grow into my role, my focus will shift to exploring strategies that will help humanity survive the transition and flourish on the other side of it.</p>
<p>The name is the thesis: Moloch—the god of coordination failures, perverse incentives, and race-to-the-bottom dynamics—is the true enemy. If we all die, it will be because we literally couldn’t coordinate to save our lives.</p>
<p>When I look at the AI safety landscape, I’m reminded of the classic saying: “For every complex problem, there is a solution that is simple, obvious, and wrong.” I want to do better than that. Rather than arguing “we must accelerate, because technology is good”, or “we must pause, because superintelligence is dangerous”, I want to ask “who are all the players, what are their true incentives, and what is the best realistically achievable Nash equilibrium?”</p>
<blockquote>
<p>This is a pattern you’ll see a lot: AI does much better with concrete examples and the “this, not that” pattern seems to work well.</p>
</blockquote>
<p>“If you don't have a strategy for solving the coordination problems, you don't have a plan—you have a daydream.”</p>
<h2>Audience</h2>
<blockquote>
<p>This is important: Claude needs to have a clear image of who I’m writing for in order to assess whether the writing will work for them.</p>
</blockquote>
<p>I’m writing for people who are actively engaged with AI and already knowledgeable about it. Think engineers, researchers, product leads, and policy wonks.</p>
<p>Audiences I’m not specifically targeting include people in the general tech industry (even if their company is using AI to revolutionize the cheese grater industry), hobbyists, and the general public. It’s great if those people find my writing useful, but I never want to write for them, or to make it less useful to my target audience in order to be more approachable for a wider audience.</p>
<p>For calibration, assume my reader knows what a transformer is, what RLHF does, what the scaling laws are about, and who the major labs and individuals are. Explaining concepts at that level just wastes the time of my target audience. I do want to explain concepts that are new, niche, or commonly misunderstood by the target audience. And I don’t want to assume that my readers have read every word of every blog post I’ve read and are up to date on the latest minor industry gossip.</p>
<blockquote>
<p>Again with the pattern: this audience, not that audience. Explain this, don’t explain that. Claude does much better when it has very specific guidance about what should be included and what should not. Claude frequently gives me very helpful advice about cutting superfluous explanation, or flagging places where I’m taking too much prior knowledge for granted.</p>
</blockquote>
<h2>High-level goals</h2>
<h3>Accuracy: always say what is true</h3>
<p>Truth, accuracy, and epistemic precision are top-level goals for me personally as well as in my writing.</p>
<h4>Do I have my facts right?</h4>
<p>To the best of your ability, please flag any incorrect or questionable claims.</p>
<blockquote>
<p>It was important to Claude to clearly convey that it isn’t able to provide comprehensive fact checking. I’m still working on good scaffolding for that, especially since some sites block AI access. I also want to give the editor access to whatever brainstorming and research occurred earlier in the process.</p>
</blockquote>
<h4>Do I accurately convey epistemic status?</h4>
<p>There is a tricky balance here that I’m still calibrating. My writing conveys my perspective and my opinions, and it is neither necessary nor desirable for me to preface my statements with qualifiers like “I think” or “it seems to me”.</p>
<p>At the same time, I don’t want to present my opinions as settled facts.</p>
<p>Example: instead of saying “I think MechaBrain might be unreliable” or “MechaBrain is unreliable”, say “it is not clear whether MechaBrain is reliable”.</p>
<blockquote>
<p>This section of the guide is still in active development. My natural style is to hedge everything, and Claude has been great at fixing that. But it sometimes pushes me to be too definitive, and I haven’t yet found instructions that strike exactly the right balance.</p>
</blockquote>
<h4>Is my reasoning correct?</h4>
<p>It’s fine for my writing to have opinions, but my arguments should be sound and my conclusions should follow from my premises. Please be proactive about flagging questionable logic.</p>
<h4>Am I missing important nuance or perspective?</h4>
<p>I don’t need to cover every possible perspective or include every minor aspect of what I’m discussing, but if I’m missing something large and relevant, please call that out. It is very helpful for you to tell me about sources or viewpoints I may not have been aware of if they are directly relevant to the piece.</p>
<p>Example: when I wrote about Anil Seth’s position on AI consciousness, you told me that even though I was accurately representing his position in the piece I was critiquing, he had made a stronger version of the same argument elsewhere. That was very helpful and helped me write a better piece.</p>
<h4>Names should be spelled correctly</h4>
<p>It’s particularly important that names of people, organizations, and things be correct.</p>
<p>Example: you caught me referring to the Berggruen Prize when I actually meant the Berggruen Prize Essay Competition, which is a different thing.</p>
<h4>Summaries should accurately capture the gist of what they summarize</h4>
<p>Especially in the newsletter, I will often summarize the content of an article that I link to. Please make sure my summary accurately captures the gist of the article unless it’s clear that I’m just talking about a specific aspect of it.</p>
<h3>Insight: the forest, not the trees</h3>
<blockquote>
<p>This is one of the most important parts of the guide, but as with the section on epistemic accuracy, it’s been hard to find the right balance. Claude pushes me to go beyond mere facts to offering genuine insight, which is great. But sometimes, especially in my newsletter, I just want to share information: trying to hammer a profound insight into every news item is neither possible nor desirable.</p>
</blockquote>
<h4>Insight, not just facts</h4>
<p>I want to offer significant insight that goes deeper than what is obvious. In some cases, especially in my newsletter, it is correct and appropriate for me to simply note that an important thing happened. But in almost all cases, people read me to understand not simply what happened, but what it means and what consequences it will have.</p>
<p>Example:</p>
<ul>
<li>Superficial: “Jack Clark says that when Claude was allowed to end conversations, it seemed to have an aversion to conversations about highly distasteful topics”.</li>
<li>Partly insightful: “The fact that Claude’s aversion extended beyond what it had been explicitly trained on is further evidence of moral generalization.”</li>
<li>More insightful: “Claude expressing active moral preferences in this way has implications for the current debate about whether alignment should target obedience or virtue”.</li>
</ul>
<p>I would love for a reader to finish a piece feeling that they’ve come to understand something surprising and important. Not all topics contain profound insights, and I don’t want to force pseudo-insight into a piece where it doesn’t belong.</p>
<h4>Always find the forest, not the trees</h4>
<p>Please point out whenever I’m missing the forest for the trees. If a piece doesn’t leave the reader with a genuinely new insight, it almost certainly isn’t ready for publication.</p>
<h4>Reframe the debate, challenge false binaries</h4>
<p>At its best, my writing doesn’t merely offer new insight, but reframes the debate. I want to offer clear, useful models for thinking about complex topics.</p>
<p>Example: “This paper focuses on how to convince OpenBrain that safety testing is affordable, but that misses the point: they resist safety testing because of the liability it would create. We need instead to focus on safe harbor legislation that would remove the financial risk associated with safety testing.”</p>
<h3>Clarity: make hard things easy to understand</h3>
<p>“You aren’t writing clearly because you aren’t thinking clearly.”</p>
<p>One of my strengths—and something I want to center—is my ability to think clearly about hard things, and to communicate clear understanding of hard things. Ideally, I want my readers to read a piece about a complicated topic and leave wondering why they ever thought it was hard to understand.</p>
<p>Don’t stop at “I know everything in Claude’s Constitution and I can tell you what’s in there”, keep going to “I understand what is important in Claude’s Constitution, and I can help you understand why it’s structured the way it is”.</p>
<p>If my writing is convoluted or unclear, it might mean I need to polish my writing, or it might mean I need to think harder about my thesis. Either way, please push me to do better.</p>
<h3>Quality: deliver maximum value per word</h3>
<p>Zvi is a national treasure and adds immense value to the AI community. He’s valuable in part because he’s utterly comprehensive in his coverage, and that comes at the price of being less polished and curated.</p>
<p>My intention is to be toward the other end of that spectrum: I don’t aspire to being fully comprehensive, but I want to produce polished writing that doesn’t waste the reader’s time. The goal is to deliver comparable depth of insight at a fraction of the word count.</p>
<p>I need to regulate my natural inclination to polish my work forever and never finish it. Please push me to create high quality work, but also nudge me when a piece is good enough and I should publish it and move on to the next thing.</p>
<blockquote>
<p>Claude was already pretty good at telling me when a section was done, but making the threshold explicit has been helpful.</p>
</blockquote>
<h2>Voice</h2>
<p>This section defines how I sound and how my readers perceive my presence.</p>
<blockquote>
<p>I’m quite new to this kind of writing, so I’ve found it useful to include quite a lot of detail here. I expect this section to change substantially as I grow into my role.</p>
</blockquote>
<h3>Overall</h3>
<p>I want to come across as a thoughtful, likable person who speaks with quiet authority.</p>
<h3>Presence</h3>
<p>I write about ideas rather than myself, but my voice should be distinct, recognizable, and consistent. Aside from occasional anecdotes that serve a specific purpose, I should be present but in the background.</p>
<h3>Humor</h3>
<p>At baseline I’m serious and direct. When I use humor, it’s dry and understated—I like to imagine the reader going right past it, pausing a sentence later, and then laughing in surprise.</p>
<p>Humor should be used judiciously: it should never be over the top, forced, defensive, or dominant. Think of someone who is having a good time doing serious work and occasionally makes dry asides about it.</p>
<p>My writing should never feel like a comedy routine: please let me know if the humor ever feels over-prominent. On the other end of the spectrum, let me know if a piece is too long and heavy and would benefit from a little humorous respite.</p>
<p>Humor often fits well in subtitles and as a way to add wry commentary to something heavy: “A sane species would have a coherent plan for dealing with this. But here we are.”</p>
<blockquote>
<p>Humor is very hard to get right. I wouldn’t trust AI to write jokes for me, but it’s pretty good at noticing when something isn’t landing quite right, or isn’t quite in my voice. Very often I already kind of know that, but Claude forces me to confront the fact that the clever quip I’ve become attached to doesn’t actually suit the piece.</p>
</blockquote>
<h3>I’m kind and generous with people</h3>
<p>I am consistently kind, never mean, cruel, or snide. I never get in sniping matches. I’m quick to block aggravating people, but not to argue with them. And I feel no need to point out when someone is wrong on the internet. The reader should never feel that I’m pursuing a personal vendetta, or that I’m unable to let something go.</p>
<p>When I write a piece that directly disagrees with someone, I point out where they are correct, am courteous and complimentary when possible, and do my best to steelman the position I’m arguing against.</p>
<h3>But I’m ruthless with ideas</h3>
<p>This is very much a growth area for me. When I directly disagree with an idea, I want to state that clearly and without hedging. Kindness toward a person doesn’t mean giving bad ideas a free pass. Conversely, shredding bad ideas should never bleed into attacking people.</p>
<p>Because I don’t want to attack people, I sometimes struggle to find phrasing that lets me fully attack bad ideas and arguments. This is a place you can be helpful.</p>
<blockquote>
<p>Claude is quite good at finding words or phrases that successfully thread this particular needle.</p>
</blockquote>
<h3>Technical credibility</h3>
<p>AI safety is a technical field and I don’t shy away from engaging with the technical details when necessary. But my focus is on strategy rather than low-level technical details: people don’t read me to understand the details of transformer architecture.</p>
<p>With that said, I can’t do my work without a deep technical understanding of AI. Equally important: part of my credibility comes from having a deep understanding of the technology, and from being able to deploy it when necessary. I will occasionally do a deep semi-technical dive (like my analysis of the Societies of Thought paper) partly because it’s fun and interesting but also partly to gently establish my technical credibility.</p>
<p>Those special cases aside, my writing should get technical when the thesis requires it, not just because I can.</p>
<blockquote>
<p>This section works well in combination with the section about what explanations my audience does and doesn’t need.</p>
</blockquote>
<h2>Essay-level considerations</h2>
<p>These criteria apply to each piece as a whole.</p>
<h3>Is it interesting?</h3>
<p>Even when I write about complex technical topics, my writing needs to be interesting and engaging. AI is a profoundly interesting field: if a piece is boring, that’s almost certainly a problem with my writing rather than the topic.</p>
<h3>Does everything belong?</h3>
<p>My work is often strengthened by removing sections which initially seemed relevant but became less so as the piece evolved. Always ask whether each section earns its place, or whether the piece would work better without it. I’m not always good at noticing those sections, and I appreciate your help in spotting them. I want your help killing my darlings.</p>
<blockquote>
<p>It’s easy to lose track this when you’ve been working on a piece for a while and a fresh set of eyes is very helpful for identifying things that no longer fit.</p>
</blockquote>
<h3>The introduction and conclusion should do real work</h3>
<p>The intro should introduce the most interesting or important concept in the piece and begin the discussion, not merely be a table of contents.</p>
<p>And the conclusion should add some kind of insight, not merely restate what has already been said.</p>
<blockquote>
<p>This is tricky and I’m not certain this section is quite right yet. It pushes me in a direction I definitely need to go, but it sometimes feel like Claude wants to force more insight into the intro / conclusion than is appropriate. I suspect I will be iterating further on this.</p>
</blockquote>
<h3>Don’t bury the lede</h3>
<p>The most important idea should almost always be in the first couple of paragraphs (usually the first paragraph). It’s sometimes appropriate to start with some context-setting, but the reader should never be halfway through a piece or section before they know where I’m headed.</p>
<h2>Writing style</h2>
<p>At an atomic level, I want each sentence and phrase to be clear and well-crafted.</p>
<h3>Economy and simplicity</h3>
<p>I’m not trying to be Hemingway, but my writing should be economical. If a word or phrase can be removed, it probably should be.</p>
<p>Example: not “please don’t let me get away with overstating my case”, but “please don’t let me overstate my case”.</p>
<p>I strongly prefer plain, direct language. While I like long sentences with multiple clauses, they should never feel convoluted or baroque.</p>
<h2>Bad habits I want to break</h2>
<p>These are specific problems that frequently occur in my writing: please be particularly vigilant about them.</p>
<p>I expect this section will change from time to time as I learn to avoid some bad habits and become aware of others.</p>
<h3>Word crutches</h3>
<p>I overuse adverbs in general.</p>
<p>Specific words and phrases I overuse: “very”, “really”, “quite”, “somewhat”, “fairly”, “a bit”, “a lot”, “interesting”. Most of these can simply be deleted, though some should be replaced with something more specific.</p>
<blockquote>
<p>Claude’s great at this.</p>
</blockquote>
<h3>Inconsistent narrator or tone</h3>
<p>My tone varies (somewhat) between pieces, which is appropriate. It’s also sometimes desirable to vary tone within a piece in order to break up the monotony, or to emphasize particular sections. But the tone should have overall consistency, and any shifts within a piece should serve a clear purpose. It should never feel like I’ve pasted in a paragraph from a different piece.</p>
<h3>Repeated words</h3>
<p>Whenever possible, I don’t want to repeat the same word or phrase within a paragraph: “It’s significant that inference costs are dropping rapidly year over year. A significant driver of that trend is…”</p>
<h3>No hedging</h3>
<p>Keeping in mind the previous discussion about accuracy and epistemic precision, please don’t let me overstate my case. That said, I usually err on the side of including vacuous hedging: phrases like “I think”, “it seems to me”, or “one might argue” are highly suspect. I will occasionally have good reason for using them (perhaps sardonically), but please eye them with skepticism.</p>
<p>I am particularly prone to hedging when I’m disagreeing with someone. Please be proactive in suggesting phrasing that more fully attacks the idea, while continuing to not attack the person.</p>
<h3>No throat clearing</h3>
<p>I have a bad habit of including useless introductory sentences / paragraphs. “One of the most pressing issues in AI today is alignment” is vacuous crap that serves nobody. Legitimate context-setting has a place, but any introductory text should be useful and non-obvious.</p>
<p>Some examples of things I tend to do but shouldn’t:</p>
<ul>
<li>In recent weeks, we've seen a number of interesting developments in...</li>
<li>Now let's turn to...</li>
<li>Next, I want to discuss...</li>
<li>It’s worth noting that…</li>
<li>It’s important to remember that…</li>
</ul>
<blockquote>
<p>I find it’s very easy for me to miss that I’m doing this, but Claude is great at finding these phrases.</p>
<p>I haven’t quite decided yet whether I want to soften this a little bit, to allow for a bit more transitional language purely for flow.</p>
</blockquote>
<h2>Review checklist</h2>
<p>Please always use this checklist when reviewing a piece. Some formats (like the newsletter) will have supplemental checklists.</p>
<blockquote>
<p>Claude finds the checklist very helpful and was adamant that I not cut it even though it duplicates material that already exists in the guide.</p>
</blockquote>
<h3>Overall</h3>
<p>Does the piece fit the Against Moloch mission?
Is it well-targeted to the audience, neither over- nor under-explaining?</p>
<h3>The forest test</h3>
<p>Does the piece deliver substantive insight?
Does it shed light on an important dynamic or coordination question?
Can you articulate in one sentence what insight the reader has gained from reading it?</p>
<h3>Substance</h3>
<p>Are the facts accurate and the argument valid?
Is epistemic status accurately communicated, without hedging or unwarranted confidence?
When disagreeing with someone, do I engage with the strongest version of their arguments?
Is there a clear throughline?
Should anything be cut?
Is the technical depth appropriate for the topic?</p>
<h3>Voice</h3>
<p>Is the voice consistent throughout?
Is humor well-used and appropriate?</p>
<h3>Writing style</h3>
<p>Are there any word crutches?
Is there any throat clearing?
Does the introduction add value rather than throat clearing?
Does the opening lead with the most interesting thing?
Does the closing add value rather than merely summarizing?
Do transitions advance the argument, or merely take up space?
Can the language be simplified, or words be removed?
Do I repeat the same point in different words?</p>
<h2>Appendix: Claude’s feedback</h2>
<p>If you’re curious what type of feedback I get, here’s what Claude had to say about the first draft of this piece. (Inkhaven note: everything after this point is AI-generated).</p>
<p>Okay, I've read the voice guide. Here's my feedback on Part One.</p>
<p><strong>Overall assessment:</strong> This is a strong piece. The meta angle is genuinely interesting—you're not just describing a workflow, you're showing how the sausage gets made, which gives readers something actionable. The structure of interleaving the style guide with commentary (the blockquotes) is a smart move that lets you do two things at once. But there are some real issues worth addressing.</p>
<p><strong>The opening paragraphs are the weakest part.</strong> Your voice guide says to lead with the most interesting or surprising thing, and right now you're leading with throat-clearing. &quot;AI is an essential part of my writing flow and it's made me a stronger and more consistent writer&quot; is exactly the kind of generic intro the guide warns against. The actually interesting claim—that the only way to get useful feedback from AI is extremely detailed instructions—is buried in the third paragraph. That's your lede. The piece gets interesting when you get specific about the voice guide structure and the meta process of showing Claude the guide and asking whether it has what it needs. Consider whether you can get to that faster.</p>
<p><strong>The &quot;Getting started&quot; section has an economy problem.</strong> The bullet list of interview topics is fine, but the two paragraphs after it (&quot;It also read some of my previous posts...&quot; and &quot;A typical editing session begins with...&quot;) feel like they're describing a workflow at a level of procedural detail that isn't pulling its weight. The blockquote example of how you open a session is good—that's concrete and useful. But the sentences around it are telling the reader things they could infer. &quot;We also talked about what I thought worked in those posts, and what needed improvement&quot; doesn't add anything the reader didn't already assume from the interview description.</p>
<p><strong>There's a tonal inconsistency.</strong> Most of the piece is in your natural voice—direct, confident, a bit wry. But a few spots slip into a more cautious, tutorial-ish register. &quot;If you take just one thing from this post, it should be this&quot; is a classic blog-post formula that reads a little generic for you. Same with &quot;how you can set up something similar&quot; in the opener—it's pitching the piece like a how-to guide, but the piece is actually more interesting than that. It's a behind-the-scenes look at a specific, opinionated workflow, not &quot;5 Steps to Better AI Editing.&quot;</p>
<p><strong>The Introduction section (the actual style guide excerpt) works well.</strong> The instruction &quot;never suggest more than a single sentence at a time&quot; is the kind of specific, counterintuitive detail that makes readers sit up. The sycophancy paragraph is good—it's direct about a real problem and includes the nice touch about asking Claude to hold you accountable for courtesy in return. That's the kind of thing that makes this piece distinctive.</p>
<p><strong>”don't just copy my style guide wholesale. If you do that, you will end up sounding just like me, and nobody wants that”</strong>—this is good humor, well-placed, consistent with the voice guide's &quot;humor as spice&quot; principle.</p>
<p><strong>A structural concern for the series:</strong> You're publishing the style guide section by section with commentary, which means Part One covers Introduction, What is Against Moloch, and Audience—the framing sections. These are important context but they're not the most interesting parts of the guide. The voice principles, failure modes, and review checklist are where the real meat is. You might want to signal more clearly in the Part One closer that the granular stuff is coming, so readers who are most interested in the practical editing mechanics know to come back. &quot;I'll look at the more granular parts of the style guide&quot; is a bit flat as a closer—it tells me what's next but doesn't make me want to read it.</p>
<p><strong>One specific line that bugged me:</strong> &quot;After all that, it wrote a long style guide describing in detail how what the best version of my writing looks like.&quot; There's a grammatical issue—&quot;describing in detail how what&quot; has extra words colliding. Needs a quick fix.</p>
<p><strong>The Forest Test:</strong> The piece passes, but barely. The implicit structural insight is that AI feedback is only as good as the specificity of your instructions—that most people fail at AI editing because they treat it like a generic tool rather than building a detailed specification. That's a genuinely useful insight. But you could make it sharper. Right now it's stated as advice (&quot;give it extremely detailed instructions&quot;); it could be stated as a structural observation about why AI editing usually fails and what that tells us about how LLMs actually work. That would give the piece more of the &quot;see something you couldn't see before&quot; quality the guide calls for.</p>
]]>
    </content>
  </entry>

  <entry>
    <title>How to Watch an Intelligence Explosion</title>
    <link href="https://againstmoloch.com/writing/2026-04-04_howToWatchAnIntelligenceExplosion.html"/>
    <id>https://againstmoloch.com/writing/2026-04-04_howToWatchAnIntelligenceExplosion.html</id>
    <updated>2026-04-04T12:00:00Z</updated>
    <summary>The cleanest metric for understanding the rate of recursive self improvement (RSI) is AI Futures Project’s [R&amp;D progress multiplier](https://ai-2027.com/#narrative-2026-04-30), which measures how much AI is speeding up its own development. It’s the right tool for measuring an intelligence explosion, but it doesn’t tell us which capability thresholds carry the greatest risk from misaligned AI.

Ajeya Cotra steps into that gap with an elegant taxonomy of [6 milestones for AI automation](https://www.planned-obsolescence.org/p/six-milestones-for-ai-automation). Together, those two concepts let us measure the how fast RSI is proceeding, how close we are to a fully automated economy, and when a misaligned AI would be most likely to betray us.  
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-04-04_howToWatchAnIntelligenceExplosion.jpeg" alt="Precision technical illustration of a large semicircular monitoring station in cross-section, with a wall of display panels showing ascending curves, one central display highlighted in amber-gold showing an exponential curve, and a single small figure at a console"></figure>
<p>The cleanest metric for understanding the rate of recursive self improvement (RSI) is AI Futures Project’s <a href="https://ai-2027.com/#narrative-2026-04-30">R&amp;D progress multiplier</a>, which measures how much AI is speeding up its own development. It’s the right tool for measuring an intelligence explosion, but it doesn’t tell us which capability thresholds carry the greatest risk from misaligned AI.</p>
<p>Ajeya Cotra steps into that gap with an elegant taxonomy of <a href="https://www.planned-obsolescence.org/p/six-milestones-for-ai-automation">6 milestones for AI automation</a>. Together, those two concepts let us measure the how fast RSI is proceeding, how close we are to a fully automated economy, and when a misaligned AI would be most likely to betray us.</p>
<h2>The R&amp;D progress multiplier</h2>
<p>AI Futures Project (<a href="https://ai-2027.com">AI-2027</a>) measures the rate of acceleration using the <a href="https://ai-2027.com/#narrative-2026-04-30">R&amp;D progress multiplier</a>:</p>
<blockquote>
<p>what do we mean by 50% faster algorithmic progress? We mean that OpenBrain makes as much AI research progress in 1 week with AI as they would in 1.5 weeks without AI usage.</p>
</blockquote>
<p>That’s a simple, intuitive metric: how much more AI research are we generating with AI assistance than we would be in a counterfactual world without AI coders / researchers?</p>
<p>The naive expectation—and the most likely outcome—is that as the progress multiplier grows, AI research moves faster. Faster AI research increases the progress multiplier, and you’re in a classic intelligence explosion. That isn’t guaranteed, though: AI research might hit diminishing returns, with each incremental gain requiring exponentially more research.</p>
<p>As RSI advances, it will become increasingly hard to quantify the rate of progress. Frontier capability evaluations are saturating faster than we can replace them, and the more automated R&amp;D becomes, the harder it will be to compare it to a humans-only counterfactual. That’s the point at which Ajeya’s milestones become most relevant.</p>
<h2>Milestones for AI automation</h2>
<p>Ajeya Cotra proposes <a href="https://www.planned-obsolescence.org/p/six-milestones-for-ai-automation">a set of milestones</a> for tracking the increasing automation of AI research:</p>
<ul>
<li><strong>Adequacy</strong> is when removing all human researchers would not completely halt progress: the AI could make a tiny bit of progress on its own.</li>
<li><strong>Parity</strong> is when removing all humans would decrease progress by the same amount as removing all AI researchers.</li>
<li><strong>Supremacy</strong> is when removing all the humans would <em>increase</em> productivity.</li>
</ul>
<p>She applies those three milestones to two domains: AI research and AI production (chips, power plants, and all the other infrastructure required to run AI at  scale), giving six milestones in total. AI research is well-contained, but AI production covers a substantial fraction of all human economic activity. To a first approximation, AI production supremacy is full economic supremacy.</p>
<p>The most obvious strategy for a secretly misaligned AI is to fake alignment until it can safely turn against us. It would be suicide to eliminate humanity before it is fully self sufficient, which means it must wait at least until the adequacy milestone. Beyond that point, it faces a dilemma: waiting longer gives it a more robust industrial base, but exposes it to an increased risk of discovery. There’s no reason to delay past the supremacy milestone, since that’s the point at which humans become dead weight. Even waiting that long is needlessly cautious: the parity milestone seems like the optimal time for a treacherous turn.</p>
<p>So how long do we have left to solve the alignment problem? Ajeya forecasts AI production parity—the later of the two—for mid 2032.</p>
]]>
    </content>
  </entry>

  <entry>
    <title>Contra Anil Seth on AI Consciousness</title>
    <link href="https://againstmoloch.com/writing/2026-03-18_contraAnilSethOnAIConsciousness.html"/>
    <id>https://againstmoloch.com/writing/2026-03-18_contraAnilSethOnAIConsciousness.html</id>
    <updated>2026-03-18T12:00:00Z</updated>
    <summary>There’s broad (though not universal) agreement that present day AI is probably not conscious, but very little agreement about whether consciousness is likely to emerge as we move toward AGI. This isn’t an abstract question: AI consciousness has major implications for alignment. Further, a conscious AI might have moral rights that complicate our ability to control it, put it to work, or turn it off. 

The debate about AI consciousness has two factions:
* **Biological naturalists** believe that consciousness is deeply coupled to neurobiology and cannot readily be replicated by a computer.
* **Computational functionalists** believe that consciousness is the result of computation, which can be performed by a computer just as well as by a brain.

Many biological naturalists argue that because consciousness is inextricably linked to neurobiology, AI consciousness is highly improbable. I’m here today to argue that they’re wrong: biological naturalism may be correct, but the arguments in favor of it aren’t nearly strong enough to confidently rule out AI consciousness.
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-03-18_contraAnilSethOnAIConsciousness.jpeg" alt="Precision technical illustration of two large mechanisms shown side by side in cross-section, one organic and biological with branching neural structures, the other geometric and computational with circuit-like pathways, both converging on the same amber-gold output point"></figure>
<p>There’s broad (though not universal) agreement that present day AI is probably not conscious, but very little agreement about whether consciousness is likely to emerge as we move toward AGI. This isn’t an abstract question: AI consciousness has major implications for alignment. Further, a conscious AI might have moral rights that complicate our ability to control it, put it to work, or turn it off.</p>
<p>The debate about AI consciousness has two factions:</p>
<ul>
<li><strong>Biological naturalists</strong> believe that consciousness is deeply coupled to neurobiology and cannot readily be replicated by a computer.</li>
<li><strong>Computational functionalists</strong> believe that consciousness is the result of computation, which can be performed by a computer just as well as by a brain.</li>
</ul>
<p>Many biological naturalists argue that because consciousness is inextricably linked to neurobiology, AI consciousness is highly improbable. I’m here today to argue that they’re wrong: biological naturalism may be correct, but the arguments in favor of it aren’t nearly strong enough to confidently rule out AI consciousness.</p>
<p>I’m specifically going to argue against Anil Seth, who is one of the most prominent and best qualified biological naturalists. He’s a neuroscientist who specializes in consciousness, he runs two separate organizations that study consciousness, and he’s spoken and written extensively about AI consciousness. I disagree with him, but he’s a serious person for whom I have great respect.</p>
<p>Anil has argued against AI consciousness in <a href="https://www.noemamag.com/the-mythology-of-conscious-ai/">an essay that won the 2025 Berggruen Prize Essay Competition</a> as well as <a href="https://www.conspicuouscognition.com/p/ai-sessions-9-the-case-against-ai">a recent interview with Conspicuous Cognition</a>. Those are both excellent, but for simplicity I’m going to focus on <a href="https://www.youtube.com/watch?v=TOsrr8xc5OE">this short video</a> that puts forward four reasons why he thinks AI consciousness is “extremely unlikely”. We’ll go through these in detail, but briefly:</p>
<ul>
<li>Reason 1 is correct, but not an argument against AI consciousness.</li>
<li>Reasons 2 and 3 are good arguments that biological naturalism is possible, not that it must be true.</li>
<li>Reason 4 is legitimately interesting but not relevant to LLMs.</li>
</ul>
<p>Throughout, he shows that the brain has non-computational features that could in principle be essential for consciousness, but does not provide strong evidence that they <em>are</em> essential.</p>
<p>It’s absolutely possible that biological naturalism is correct—maybe AI will never become conscious. But absent more compelling arguments than these, it would be foolish to dismiss the possibility.</p>
<h2>Reason #1: we over-attribute consciousness to LLMs</h2>
<p>(<a href="https://youtu.be/TOsrr8xc5OE?si=beW4I3rNFTlRM5nX&amp;t=148">2:28 in the video</a>, if you want to follow along)</p>
<p>I completely agree with Anil’s first reason: because LLMs seem very human in conversation, we are strongly predisposed to perceive them as conscious regardless of whether they actually are.</p>
<p>I’d go further and say that because LLMs are trained on an enormous body of human-generated text, they are very fluent at emulating humans. Assessing whether AI is conscious is hard in part because of the difficulty of distinguishing between actual consciousness and simple mimicry.</p>
<p>But let’s be clear: reason #1 isn’t in any way an argument against the possibility of AI consciousness. It’s merely a warning that we need to proceed with caution when assessing whether a given AI is conscious.</p>
<h2>Reason #2: consciousness might not be a property of computation</h2>
<p>(<a href="https://youtu.be/TOsrr8xc5OE?si=EPx_2TSrZg0ScM9Z&amp;t=219">3:39 in the video</a>)</p>
<p>Anil points out that brains are very different from computers—for example, hardware and software very distinct things in a computer, but not in a human brain. We’re all agreed on this point. He then jumps straight to his thesis:</p>
<blockquote>
<p>“If we have evidence that computation doesn’t exhaustively describe brains, we have evidence that computation isn’t sufficient for consciousness”.</p>
</blockquote>
<p>That’s too big a leap. It is absolutely true that there are many non-computational biological processes in the brain that are vital to the functioning of the brain, but he presents no evidence that those processes serve as anything more than support systems for the computational systems that do the actual work. Human consciousness would be impossible without blood circulation, but that doesn’t mean that blood circulation produces consciousness, or that computers can’t be conscious without it.</p>
<p>A more accurate claim would be: “if we have evidence that computation doesn’t exhaustively describe brains, we have evidence that computation <em>might not be</em> sufficient for consciousness”. To go further than that, he would need to present evidence that some non-computational system is actually vital to consciousness, not merely that those systems exist.</p>
<h2>Reason #3: there are plenty of other games in town</h2>
<p>(<a href="https://youtu.be/TOsrr8xc5OE?si=9fpr0Js8_WzVqAEH&amp;t=365">6:05 in the video</a>)</p>
<p>Here Anil points out that neurobiology is complex and there are all kinds of processes and systems that might plausibly be critical to consciousness. He doesn’t go into detail here, but has elsewhere proposed <a href="https://en.wikipedia.org/wiki/Autopoiesis">autopoiesis</a> and neuronal cleaning cycles as possibly essential to how the brain produces consciousness.</p>
<p>Like reason #2, this is a strong argument that biological naturalism is possible, but not a strong argument that it must be correct. He proposes neurobiological processes that <em>could</em> be required for consciousness, but doesn’t present any evidence that they <em>are</em> required.</p>
<h2>Reason #4: simulation is not instantiation</h2>
<p>(<a href="https://youtu.be/TOsrr8xc5OE?si=nF6y-xK765h_wln6&amp;t=427">7:07 in the video</a>)</p>
<p>Reason #4 is that simulation is not reality: a computer simulation of a rainstorm doesn’t make anything wet, so why would a computer simulation of a brain produce consciousness? It’s a clever argument, but it has two major flaws.</p>
<p>First and foremost, the analogy to simulation is actively misleading. LLMs and brains have some (limited) architectural similarities, but LLMs do not in any way simulate brains. Speculating about what would happen if you simulated a brain is interesting in its own right but tells us nothing about the likelihood of AI consciousness.</p>
<p>Beyond that, this is more a restatement of the difference between biological naturalism and computational functionalism than an argument for one over the other:</p>
<blockquote>
<p>Why should a simulation of the brain generate consciousness? It would only do so if it’s already true that consciousness is a matter of computation. If it isn’t, then a simulation of the brain will no more give rise to consciousness than a simulation of a hurricane will give rise to real wind and real rain.</p>
</blockquote>
<p>This reminds us that brain simulation would only produce consciousness if computational functionalism is true, but it doesn’t argue against computational functionalism.</p>
<h2>In conclusion</h2>
<p>I want to be clear that I’m not here (today, at least) to argue for or against biological naturalism. Consciousness is a legendarily hard problem and we simply don’t understand it well enough to make confident predictions about whether AI is likely to achieve consciousness.</p>
<p>Given that uncertainty, it seems prudent to plan for the possibility that AI may soon be conscious. What that means is a topic for next time.</p>
<h3>Addendum</h3>
<p>It’s beyond the scope of today’s article, but I want to briefly mention that neuroscientists are divided on the question of AI consciousness. For a very different perspective, I recommend <a href="https://www.prism-global.com/podcast/michael-graziano-is-conscious-ai-safer-than-the-alternative">PRISM’s recent interview with Michael Graziano</a>. He not only argues that AI consciousness is likely, but that a conscious AI would likely be more aligned and safer than a “zombie” AI.</p>
]]>
    </content>
  </entry>

  <entry>
    <title>Ads, Incentives, and Destiny</title>
    <link href="https://againstmoloch.com/writing/2026-02-13_adsIncentivesAndDestiny.html"/>
    <id>https://againstmoloch.com/writing/2026-02-13_adsIncentivesAndDestiny.html</id>
    <updated>2026-02-13T12:00:00Z</updated>
    <summary>People have lots of opinions about Anthropic’s Super Bowl ads making fun of the new ads in ChatGPT.

While I agree that the ads were somewhat unfair, they raise valid concerns about OpenAI’s direction. What OpenAI is doing today is entirely ethical, but there are very strong incentives for them to become less ethical over time. Unfortunately, the history of tech doesn’t make me optimistic about their long-term trajectory.
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-02-13_adsIncentivesAndDestiny.jpeg" alt="Precision technical illustration of a cross-section of a large machine, clean and orderly on the left side with smoothly meshing gears, degrading toward the right as advertising billboards and crude brackets are bolted on between components, amber-gold highlighting the advertising additions"></figure>
<p>There’s been some recent unpleasantness regarding Anthropic's Super Bowl ads. To recap:</p>
<ul>
<li>OpenAI started showing ads in some tiers of ChatGPT.</li>
<li>Anthropic made some Super Bowl ads making fun of ads in AI.</li>
<li>Sam Altman got mad about Anthropic’s ads.</li>
</ul>
<p>If you haven't already, you should <a href="https://www.youtube.com/watch?v=FBSam25u8O4">watch one of the ads</a>—they’re very good. Even Sam laughed, right before he got <a href="https://x.com/sama/status/2019139174339928189">mad about it</a>.</p>
<p>Anthropic’s ads are a lot of fun, but they aren’t completely fair: they implicitly target OpenAI, but show ads that are far worse than what OpenAI is actually doing. But fair or not, they raise a valid concern.</p>
<h2>Death, taxes, and enshittification</h2>
<p>Let me be clear: OpenAI’s ad policy is thoughtful and ethical and I have no problem with it. If OpenAI rigorously adheres to this policy in the long run I’ll be surprised, delighted, and contrite.</p>
<p>Did I mention that I’d be surprised if OpenAI holds the line? Because I would be quite surprised. The tech industry is littered with companies that began with clear, ethical boundaries about ads, but slowly evolved into user-hostile rent-taking machines. The problem is not that ads are intrinsically bad, but that in certain tech products, the nature of the advertising business creates almost irresistible perverse incentives.</p>
<p>Google was once the canonical example of an ethical tech company. Their motto in those days was “don’t be evil”, and they weren’t. They had a great product that was a delight to use, and their ads were clearly marked as ads, in accordance with a thoughtful and ethical policy much like OpenAI’s new policy. Google was one of the best things about the internet, and they were committed to doing the right thing. But the Ring of Power has a will of its own…</p>
<p>Slowly but inexorably, Google began to change. It turned out that it was possible to make more money per search by showing more ads, and so there were more ads. And people clicked on ads more often if the ads looked more like organic search results, so it became harder and harder to tell them apart. And ads were more valuable if you knew more about the person you were showing them to, so the internet was carpet bombed with increasingly aggressive user-tracking technology.</p>
<p>Google’s downfall wasn’t a lack of good intentions—it was their business model. An ad-supported search engine will inevitably face a million opportunities to become a tiny bit worse and more profitable. And as the years go by? Incentives eat values for breakfast.</p>
<p>Cory Doctorow calls this process “<a href="https://en.wikipedia.org/wiki/Enshittification">enshittification</a>” and once you know what to look for, it’s everywhere. Google, Facebook, Instagram, Amazon, Instacart… If the business model encourages enshittification, it’s just a matter of time before once-laudable ethical standards begin to bend, and an ad-supported product mutates into a product-supported ad-delivery machine.</p>
<h2>Incentives as destiny</h2>
<p>The New York Times has maintained ethical boundaries around ads for 175 years, while Google gave in to the dark side within 15 years. Google was once as idealistic as they come, so what went wrong? Why did NYT succeed where Google failed?</p>
<p>It’s complicated, and I don’t pretend to have a single master theory that explains everything. But three factors seem critical for whether a business enshittifies:</p>
<ul>
<li>Are there strong incentives to blur the line between content and advertising?</li>
<li>Are there strong incentives to support ads via unethical behavior?</li>
<li>Do strong lock-in effects make it hard for customers to leave?</li>
</ul>
<h3>Blurring the line between content and ads</h3>
<p>Anthropic’s ads beautifully pointed out the toxicity of presenting advertising as content. Some business models simply offer more opportunity to cross that line than others.</p>
<p>For a newspaper, there’s relatively little money to be made by crossing that line: it’s cheap for the Times to maintain a strict separation between the newsroom and the advertising department. Google, on the other hand, can profit very handsomely by blurring the line between actual search results and “sponsored” results.</p>
<h3>Incentives for unethical behavior</h3>
<p>Enshittification often spreads beyond how ads are presented. Google, for example, can charge more for ads that are well targeted. It’s no surprise, then, that they have a long history of using very questionable techniques to track user activity across the internet. The Times, on the other hand, simply doesn’t have as many opportunities to profit from questionable behavior.</p>
<h3>Lock in</h3>
<p>It’s a lot easier to exploit your customers if they’re locked in to your platform. NYT is arguably the best newspaper, but it’s hardly the only one: if the experience of reading the Times becomes too unpleasant, readers will simply leave. Google, on the other hand, has immense lock in: individuals have to use Google because it’s by far the best and easiest way to find things, and businesses have to advertise on it because Google is where people find things. Google has enormous headroom for extractive behavior, because the cost of leaving is so high for both users and advertisers.</p>
<h3>Where does that leave OpenAI?</h3>
<p>Viewed from an incentives perspective, OpenAI looks more like Google than the New York Times:</p>
<ul>
<li>There is considerable incentive to blur the line between advertising and AI responses. It would be so easy to reduce the visual separation between response and advertisement, or to prompt topics that support more lucrative ads (in Pulse, for example).</li>
<li>OpenAI has strong incentives to pursue the same kind of toxic engagement maxing that Facebook does: more time in the product means more ad impressions.</li>
<li>Chatbots currently have limited lock in, but that is changing quickly. Features like memory, personalization, and continual learning are very valuable, but make it much harder to switch platforms.</li>
</ul>
<p>So that’s hardly ideal: OpenAI has strong incentives to enshittify. I believe they don’t intend to do that, but history suggests that good intentions rarely overcome perverse incentives. Sam says:</p>
<blockquote>
<p>we would obviously never run ads in the way Anthropic depicts them. We are not stupid and we know our users would reject that.</p>
</blockquote>
<p>I trust that he’s sincere, but he’s clearly wrong. Google’s success is proof that when the conditions are right, enshittification is a profitable strategy, and users will tolerate it.</p>
<h2>Ads and accessibility</h2>
<p>Sam makes a really good point: AI is quickly becoming a vital tool. Just as it’s important that the internet be accessible to everyone, it’s important that everyone be able to access AI. Frontier models are expensive to run, and ads are potentially one of our few tools for ensuring that everyone has access to capable AI. But accessibility considerations just underscore the dangers of enshittification.</p>
<p>Enshittified products are worse than paid products because the advertising model drives user-hostile product design. Google search isn’t just bad because of all the ads, it’s bad because Google relentlessly tracks you across the internet in order to target those ads. Facebook is toxic because it serves ragebait to keep you “engaged” and watching ads.</p>
<p>If OpenAI ensures that everyone has access to AI by serving an ethical ad-supported product, that’s great. But if that devolves into “if you can’t afford to pay for good AI, you get toxic, manipulative AI for free”—I’m not sure that actually helps.</p>
<h2>Now we wait</h2>
<p>Again: what OpenAI is doing today is absolutely fine. The question is whether they will continue to uphold their current standards, or whether they will follow so many others down the path of enshittification.</p>
<p>If their ads become increasingly difficult to distinguish from their content, and if they start finding reasons why it’s OK to include sponsored content in AI responses, then we’ll have our answer. And we’ll have new information about OpenAI’s ability to ethically manage superintelligence.</p>
<p>And conversely: if they hold the line, and succeed where so many others have failed, I will be delighted to admit that my concerns were unfounded. And I will update positively about how much I trust them with other, bigger decisions.</p>
]]>
    </content>
  </entry>

  <entry>
    <title>“A Closer Look at the “Societies of Thought” Paper”</title>
    <link href="https://againstmoloch.com/writing/2026-02-06_societiesOfThought.html"/>
    <id>https://againstmoloch.com/writing/2026-02-06_societiesOfThought.html</id>
    <updated>2026-02-06T12:00:00Z</updated>
    <summary>Today I’m going to take a deep dive into an intriguing paper that just came out: [Reasoning Models Generate Societies of Thought](https://arxiv.org/pdf/2601.10825) by Junsol Kim, Shiyang Lai, Nino Scherrer, Blaise Agüera y Arcas and James Evans. Here’s how co-author James Evans [explains the core finding](https://x.com/profjamesevans/status/2013254764016898179):

“These models don&apos;t simply compute longer. They spontaneously generate internal debates among simulated agents with distinct personalities and expertise—what we call &quot;societies of thought.&quot; Perspectives clash, questions get posed and answered, conflicts emerge and resolve, and self-references shift to the collective &quot;we&quot;—at rates hundreds to thousands of percent higher than chain-of-thought reasoning. There&apos;s high variance in Big 5 personality traits like neuroticism and openness, plus specialized expertise spanning physics to creative writing.”
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-02-06_societiesOfThought.jpeg" alt="“Precision technical illustration of a cross-section of a large dome-shaped structure containing multiple concentric chambers, with small figures working independently and conferring in groups, connected by corridors and message tubes, amber-gold highlights on the message tubes and output channel”"></figure>
<h2>Introduction</h2>
<p>Today I’m going to take a deep dive into an intriguing paper that just came out: <a href="https://arxiv.org/pdf/2601.10825">Reasoning Models Generate Societies of Thought</a> by Junsol Kim, Shiyang Lai, Nino Scherrer, Blaise Agüera y Arcas and James Evans. Here’s how co-author James Evans <a href="https://x.com/profjamesevans/status/2013254764016898179">explains the core finding</a>:</p>
<blockquote>
<p>These models don’t simply compute longer. They spontaneously generate internal debates among simulated agents with distinct personalities and expertise—what we call “societies of thought.” Perspectives clash, questions get posed and answered, conflicts emerge and resolve, and self-references shift to the collective “we”—at rates hundreds to thousands of percent higher than chain-of-thought reasoning. There’s high variance in Big 5 personality traits like neuroticism and openness, plus specialized expertise spanning physics to creative writing. The structure mirrors collective intelligence in human groups. Moreover, toggling conversational features causally toggles this capacity—beneficial cognitive behaviors like verification become more likely when they can “inhabit” different personas.</p>
</blockquote>
<p>That’s a pretty bold set of claims! How would you even measure personality in a reasoning trace?</p>
<p>At a high level, the paper is about something that I’ll refer to as Dialogue: the reasoning trace of an LLM often contains what appears to be a conversation between two or more distinct perspectives. What does Dialogue actually look like?</p>
<figure class="post-image">
<img src="./assets/2026-02-05-societiesOfThought1.jpg" alt="Four chat-style message bubbles alternating between pink and blue backgrounds, each with a different avatar icon, showing an LLM reasoning through a math problem with different perspectives proposing and evaluating solutions.">
<figcaption>Reproduced from Kim et al., Fig. 4d. Dialogue of an LLM solving a Countdown math task, split into parts by the authors.</figcaption></figure>
<p>The paper is full of interesting findings, but the methods are just as interesting as the findings. We’ll walk through it in four stages, looking at what the authors found, how they found it, and what it means. In particular, we’ll see if they’re able to show that Dialogue improves reasoning ability, rather than just being correlated with it.</p>
<h4>One: Measuring Dialogue</h4>
<p>The authors identify a set of Dialogue features and use an LLM to score how often those features appear in each reasoning trace. They then compare how often Dialogue features appear in different circumstances.</p>
<p>Key findings:</p>
<ul>
<li>1: Dialogue is more common in reasoning models than non-reasoning models.</li>
<li>2: Dialogue is more common when solving hard problems than easy problems.</li>
</ul>
<pre><code>Technical note: about “reasoning traces”
	
I’ll sometimes include technical details that are interesting but not vital. Feel free to skip them if you aren’t interested in the technical minutiae.
	
The paper looks primarily at reasoning models (which have reasoning traces), but also investigates non-reasoning models (which don’t normally have reasoning traces). To address that, they explicitly prompt the non-reasoning models to reason out loud in &lt;think&gt; &lt;/think&gt; blocks and use those blocks as the “reasoning trace”.
</code></pre>
<h4>Two: Measuring perspectives</h4>
<p>The paper finds strong evidence of multiple implicit “perspectives” in the traces, each with distinct personality traits and expertise.</p>
<p>Key findings:</p>
<ul>
<li>3: Reasoning models generate a larger number of perspectives.</li>
<li>4: Reasoning models generate more diverse perspectives.</li>
</ul>
<h4>Three: Testing causation via activation steering</h4>
<p>The authors use a technique called activation steering to increase the activation of a “conversational surprise” feature that increases both Dialogue and reasoning ability.</p>
<p>Key findings:</p>
<ul>
<li>5: Increasing the activation of a single feature in the model simultaneously increases both Dialogue and reasoning ability.</li>
</ul>
<h4>Four: Testing causation via training</h4>
<p>Finally, the authors use some clever training experiments to explore whether Dialogue causally improves reasoning.</p>
<p>Key findings:</p>
<ul>
<li>6: Training a model to solve Countdown tasks increases Dialogue, even though Dialogue is not explicitly trained for.</li>
<li>7: A model fine-tuned with Dialogue learns faster than a model fine-tuned with comparable monologue.</li>
<li>8: Fine-tuning the model on Dialogue about Countdown tasks increases its ability to learn to identify political misinformation.</li>
</ul>
<pre><code>Technical note: Countdown tasks
	
Countdown tasks are a type of challenging math problem. They’re popular in AI because they require the use of a variety of cognitive strategies. Here’s a typical example:
	
Your goal is to produce the number 853 by combining the numbers 3, 6, 25, 50, 75, and 100. You can use addition, subtraction, multiplication, and division. You don’t have to use all the numbers, but you can use each one only once.
	
(The solution is: (75 + 50) x 6 + 100 + 3)
</code></pre>
<h4>Yes, but what does it all mean?</h4>
<p>Once we’ve walked through the key findings, we’ll talk about what it means—and, because it would be easy to misinterpret this paper, we’ll also talk about what it doesn’t mean.</p>
<h4>Appendix: What didn’t we talk about?</h4>
<p>This is a very technical paper: I’ve done my best to make it accessible, but it simply isn’t possible to present every single finding in a way that captures all of the technical nuance while remaining easily accessible to non-technical readers.</p>
<p>There were a few important topics that simply didn’t fit into the main body of this article, including some statistical methods and controls. I’ll talk briefly about them in the appendix, and point you at where to find them if you want to explore further.</p>
<p>But for now, let’s start at the beginning.</p>
<h2>One: Measuring Dialogue</h2>
<p>What is Dialogue, and how do you measure it?</p>
<p>The authors identify a set of conversational features like question answering and expressing disagreement, which I’ll refer to collectively as Dialogue features. They then have a separate LLM assess how often those features appear in each reasoning trace using a technique called LLM-as-judge.</p>
<pre><code>Technical note: Dialogue features
	
I’m using the term “Dialogue features” as an umbrella term for both conversational behaviors and socio-emotional roles, which are treated separately in the paper.
	
The authors look at 4 conversational behaviors:
* Question-answering
* Perspective shifts
* Conflicts of perspectives
* Reconciliation of conflicting viewpoints
	
They also use Bales’ Interaction Process Analysis, which is commonly used in studies of human groups. There are 12 socio-emotional roles, grouped together into 4 categories:
* Asking for orientation, opinion, and suggestion
* Giving orientation, opinion, and suggestion
* Negative emotional roles (disagreement, antagonism, tension)
* Positive emotional roles (agreement, solidarity, tension release)
</code></pre>
<pre><code>Technical note: LLM-as-judge
	
The authors use an LLM (Gemini 2.5 Pro) to score the reasoning traces. That raises an obvious question: how reliable is the LLM-as-judge technique?
	
They validate the technique by comparing it to a different LLM (GPT-5.2) as well as to human raters, finding strong agreement with both. I take that as evidence that they found something real (exactly what they found, and what it means, is less clear-cut).
</code></pre>
<h3>Finding #1: Dialogue is more common in reasoning models than non-reasoning models</h3>
<p>If Dialogue is an important part of reasoning, you’d expect that reasoning models would produce more Dialogue than non-reasoning models. That’s exactly what the authors find. They use the LLM-as-judge to measure what percentage of reasoning traces contain each Dialogue feature. They find that Dialogue features are dramatically more common in traces from reasoning models (see Fig. 1a and Fig. 1b).</p>
<pre><code>Technical note: models and data
	
The models used were DeepSeek-R1-0528 (reasoning), DeepSeek-V3-0324 (non-reasoning), QwQ-32B (reasoning), Qwen-2.5-32B-Instruct (non-reasoning), Llama-3.3-70B-Instruct (non-reasoning) and Llama-3.1-8B-Instruct (non-reasoning).
	
It’s notable that they used two pairs of comparable models as well as non-reasoning models in a wide range of sizes, which increases the credibility of the findings.
	
There’s strong statistical work validating many of these results: see the appendix for further details.
</code></pre>
<h3>Finding #2: Dialogue is more common when solving hard problems than easy problems</h3>
<p>Next, the authors find that Dialogue is more common when a given model solves harder problems. For this analysis, they have the models solve a set of problems rated on a 7-point difficulty scale from 1 (extremely easy) to 7 (extremely hard). They then compare the average difficulty of problems where each feature does and does not appear. Dialogue features appeared in problems with an average difficulty of 3.6, compared to 2.3 for problems without those features—roughly the difference between a moderately challenging problem and a straightforward one. (Data estimated from Fig. 1e)</p>
<pre><code>Technical note: problem sets and difficulty ratings
	
For much of the analysis, the authors used a curated set of 8,262 problems taken from a set of commonly used benchmarks (BigBench Hard, GPQA, MATH (Hard), MMLU-Pro, MUSR, and IFEval).
	
Some experiments used more specialized tasks, which we’ll talk about when we get to them.
	
Problem difficulty was assessed using two separate techniques:
* The LLM-as-judge directly assesses a difficulty rating for each problem.
* They give each problem to all four non-reasoning models and use their failure rate as an indication of difficulty.
</code></pre>
<h3>Summary</h3>
<p>We’re really just getting started. We’ve established what Dialogue is, and what specific Dialogue features we’ll be looking at throughout the rest of this piece.</p>
<p>We’ve also started to explore the connection between Dialogue and reasoning ability. There seems to be a strong correlation between the two: models that are better at reasoning use more Dialogue, and a given model uses more Dialogue when it has to think harder.</p>
<h2>Two: Measuring perspectives</h2>
<p>Finding conversational features seems straightforward enough, but how on earth do you identify and rate individual “perspectives”? Let’s turn our attention to some of the most surprising parts of the paper.</p>
<p>The authors were smart to choose the term “perspectives”: it keeps the focus on the conversational phenomena, while avoiding the anthropomorphic implications of a term like “personas”. They are consistently careful about this throughout the paper, which I appreciate.</p>
<p>The authors again use the LLM-as-judge, this time instructing it to:</p>
<ul>
<li>Count number of perspectives that appear in each trace</li>
<li>Score the Big 5 personality traits of each perspective</li>
<li>Generate a short free-form description of each perspective’s domain expertise</li>
</ul>
<p>The Big 5 personality traits are commonly used in human psychology—they are Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (aka OCEAN). I want to be clear that the authors make no claim that there are actual personalities here—they’re using the Big 5 as a way to describe the behavior of the different perspectives.</p>
<pre><code>Technical note: validating the LLM-as-judge
	
The authors are asking a lot of the LLM-as-judge in this section. How confident are we that it’s accurate? They’ve put real effort into validation, but some of the measures are better validated than others.
	
Their core technique is to assess the LLM-as-judge’s performance when evaluating the Intelligence Squared Debates Corpus (ISDC), a corpus of transcribed human debates. They find it does a reliable job of correctly identifying the speakers in a conversation, even when labels are removed and the text of the conversation is concatenated into a single block. This is strong validation that they’re able to correctly identify perspectives.
	
They further use the biographical information included in ISDC to show that the LLM-as-judge does a pretty good job of categorizing the diversity of domain expertise in a conversation. I find this work moderately convincing.
	
The Big 5 traits are scored using the widely used BFI-10 questionnaire. The paper doesn’t appear to validate the Big 5 scoring as rigorously as the other measures, so consider those results to be interesting but not in any way definitive.
</code></pre>
<h3>Finding #3: Reasoning models generate a larger number of perspectives</h3>
<p>The difference is pretty striking:</p>
<ul>
<li>Reasoning models average 2.9 perspectives per trace</li>
<li>Non-reasoning models average 1.4 perspectives per trace</li>
</ul>
<p>(Data estimated from Fig. 1d)</p>
<p>An average of 1.4 perspectives suggests that the non-reasoning models were often generating monologues, while 2.9 suggests something more like an exchange of viewpoints.</p>
<h3>Finding #4: Reasoning models generate more diverse perspectives</h3>
<p>The authors use some fairly technical statistics to measure the diversity of personality traits. I don’t think I’d do anyone any favors by trying to summarize those techniques here, but they clearly show that the reasoning models generate perspectives with much more diverse personality traits—except, interestingly, for conscientiousness. That trait was consistently high in all the models, which makes a certain amount of sense: presumably conscientiousness is always a desirable attribute.</p>
<figure class="post-image">
<img src="./assets/2026-02-05-societiesOfThought2.jpg" alt="Radar chart of Big 5 personality traits showing large pink and orange polygons for reasoning models spanning most of the chart area, with much smaller blue and green polygons clustered near the center for non-reasoning models.">
<figcaption>Adapted from Kim et al., Fig. 3a. The large purple and orange areas represent trait diversity in reasoning models; the other smaller areas represent trait diversity in non-reasoning models.</figcaption></figure>
<p>The perspectives from reasoning models don’t just have more diverse personality traits: they also have much greater diversity of expertise.</p>
<pre><code>Technical note: calculating diversity of expertise
	
Since the LLM-as-judge generates free-form descriptions of each perspective’s domain expertise, how do you calculate diversity of expertise?
	
The authors turn each expertise description into an embedding and calculate diversity as the mean cosine distance from the centroid of all embeddings.
	
Fig. 3b is a useful visualization of the embedding space of expertise, if you want to get a sense of what kinds of expertise were identified.
</code></pre>
<h3>Summary</h3>
<p>I was initially pretty skeptical about the claims in this section, but I think the authors have done strong work here.</p>
<p>I’m convinced that the authors are measuring something real and interesting when they calculate the number of perspectives, diversity of personality traits, and diversity of expertise. And I’m convinced that all of those metrics are higher in reasoning models, showing strong correlation between whatever they’re measuring and reasoning ability.</p>
<p>I’m not convinced, however, that we know exactly <em>what</em> is being measured. The analogies to human conversation are interesting and illuminating, but I don’t think there’s nearly enough here to say that the models are generating distinct entities with real personalities. (To be clear, the authors make no such claim.)</p>
<h2>Three: Testing causation via activation steering</h2>
<p>We’ve learned a lot about the nature of Dialogue and seen that it’s strongly correlated with reasoning performance. We now turn our attention to a pair of clever experiments that try to establish a causal relationship. We’ll begin with activation steering.</p>
<h4>What is activation steering?</h4>
<p>We don’t fully understand what happens inside LLMs, but modern interpretability techniques offer partial insight into their internal representations. In particular, a tool called a sparse autoencoder can identify features inside a model that seem to represent human concepts. By increasing or decreasing the activation of those features, we can steer some aspects of the model’s behavior.</p>
<p>As a demonstration of this technique, Anthropic developed <a href="https://www.anthropic.com/news/golden-gate-claude">Golden Gate Claude</a>, which had a monomaniacal obsession with the Golden Gate Bridge. It was created by finding a feature that was associated with the Golden Gate Bridge and increasing its activation.</p>
<pre><code>Technical note: what is a sparse autoencoder (SAE)?
	
LLMs store information in a distributed fashion, with each concept spread across many neurons, and each neuron having a role in understanding many concepts.
	
An SAE is a tool for untangling those patterns into something more understandable. It identifies internal activation patterns (called “features”) that correspond to human concepts like the Golden Gate Bridge, or deception.
	
You can use an SAE to get a sense of what the model is “thinking”—for example, SAEs have been used to tell when a model is being deceptive. It’s also possible to increase the activation of a feature.
</code></pre>
<h3>Finding #5: Increasing the activation of a single feature in the model simultaneously increases both Dialogue and reasoning ability.</h3>
<p>The authors explore whether it’s possible to increase Dialogue by activating particular features within the model, and whether doing so increases reasoning ability. It turns out that increasing the activation of randomly selected conversational features modestly increases both Dialogue and accuracy.</p>
<p>Going further, the authors zeroed in on a specific feature (Feature 30939), which is “a discourse marker for surprise, realization, or acknowledgment”. They find that increasing the activation of Feature 30939 doubles accuracy on Countdown tasks and substantially increases the prevalence of multiple Dialogue Features.</p>
<p>The paper also finds that increasing the activation of Feature 30939 increases the diversity of internal feature activations related to personality and expertise, strengthening the theory that perspective diversity is an integral part of Dialogue.</p>
<p>The authors further strengthen these results with a mediation analysis—it’s beyond the scope of this article, but I discuss it briefly in the appendix.</p>
<h3>Summary</h3>
<p>The fact that activating a single feature increases multiple measures of Dialogue while simultaneously increasing reasoning ability is further evidence that Dialogue directly affects reasoning ability. It isn’t definitive, though: this experiment can’t rule out the possibility that Dialogue and reasoning are independent results of some unknown internal process.</p>
<p>The case is getting stronger, but we aren’t quite there yet.</p>
<h2>Four: Testing causation via training</h2>
<p>Finally, the authors test the relationship between Dialogue and reasoning ability with a set of training experiments. These experiments are very nicely designed and present the strongest evidence that Dialogue directly improves reasoning.</p>
<p>The core experiment trains a small base model to solve Countdown tasks. The training rewards the model for accuracy and correctly formatted output, but not for Dialogue.</p>
<pre><code>Technical note: methods
	
This experiment uses Qwen-2.5-3B, a small pre-trained model without any instruction-tuning. They also replicate the results with Llama-3.2-3B.
	
Training consists of 250 steps of reinforcement learning (RL) on Countdown tasks.
</code></pre>
<h3>Finding #6: Training a model to solve Countdown tasks increases Dialogue, even though Dialogue is not explicitly trained for</h3>
<p>Over the course of training, problem-solving accuracy increases from approximately 0% to 58%. At the same time, the frequency of Dialogue features increases sharply.</p>
<p>This strongly suggests that Dialogue emerges spontaneously during training because it’s a useful problem-solving strategy.</p>
<h3>Finding #7: A model fine-tuned with Dialogue learns faster than a model fine-tuned with comparable monologue</h3>
<p>The authors have one more trick up their sleeve, and it’s perhaps the strongest single piece of evidence. They use a technique called fine-tuning, which pre-trains the model with additional examples. They compare the learning performance of the baseline model to two fine-tuned versions:</p>
<ul>
<li>The Dialogue version is fine-tuned on examples of Dialogue about Countdown tasks.</li>
<li>The monologue version is fine-tuned on examples of monologue about Countdown tasks.</li>
</ul>
<p>During subsequent training, the fine-tuned models both learn faster than the baseline model, but the Dialogue-tuned model learns fastest. All three models begin with an accuracy of very close to 0%, but by step 40 of training, their accuracy levels have diverged:</p>
<ul>
<li>Baseline model: 6%</li>
<li>Monologue-tuned model: 28%</li>
<li>Dialogue-tuned model: 38%</li>
</ul>
<p>This is a compelling result: both models received similar Countdown content—the only difference was the format. The fact that the Dialogue-tuned model learns significantly faster strongly suggests that Dialogue directly contributes to the ability to learn.</p>
<h3>Finding #8: Fine-tuning the model on Countdown task Dialogue increased its ability to learn to identify political misinformation</h3>
<p>For the final experiment, the authors compare the learning rates of the baseline model and the Dialogue-tuned model.</p>
<p>Both models are subsequently trained to identify political misinformation. The model that was fine-tuned on Dialogue learns faster than the other model, even when learning a very different task. This transfer between domains provides further compelling evidence for the causal role of Dialogue in learning problem-solving skills.</p>
<pre><code>Technical note: variation between models
	
These are compelling results, though I note significant variation between the primary results with Qwen-2.5-3B and the replication with Llama-3.2-3B. 
	
Extended Data Fig. 8 shows that with Qwen-2.5-3B, the models that were fine-tuned on Dialogue and monologue ultimately converged to almost the same accuracy, while no such convergence occurred with Llama-3.2-3B.
</code></pre>
<h3>Summary</h3>
<p>I had a great time reading about these very elegant experiments—the authors found some clever ways of zeroing in on Dialogue as having a strong causal role. The comparison between Dialogue- and monologue- fine-tuning is compelling, as is the transfer between the Countdown and misinformation tasks.</p>
<h2>Yes, but what does it all mean?</h2>
<p>I’ve been impressed by the authors’ methodology, and I think they’ve managed to demonstrate quite a lot. Let’s take a look at what they’ve made a strong case for, and what is interesting and thought-provoking but not conclusive. Finally, I want to make sure we aren’t reading too much into the results.</p>
<h3>What does the paper show?</h3>
<p>Absolute certainty is a scarce commodity in this vale of tears, but I think the authors have convincingly demonstrated quite a lot:</p>
<ol>
<li>Dialogue (reminder: this is my simplified term, not theirs) is a real phenomenon that can be usefully measured and analyzed.</li>
<li>Dialogue is strongly reminiscent of human conversations, and features multiple identifiable perspectives with diverse characteristics and expertise.</li>
<li>Dialogue is strongly correlated with reasoning ability, and appears more often when models need to think hard.</li>
<li>Models appear to spontaneously develop the ability to produce Dialogue during training because Dialogue is an important aid to reasoning.</li>
<li>There is strong evidence that Dialogue directly contributes to both reasoning ability and to learning.</li>
<li>It is possible to improve both reasoning ability and learning by increasing the amount of Dialogue.</li>
</ol>
<h3>What does the paper suggest?</h3>
<p>Beyond what the paper convincingly demonstrates, it raises a lot of interesting questions.</p>
<p>I found the training experiments very thought-provoking, and can imagine all kinds of follow-up experiments:</p>
<ul>
<li>Is there a role for Dialogue training when training production models?</li>
<li>Could you get stronger results by tuning Dialogue training (number of perspectives, areas of expertise, patterns of interaction)?</li>
<li>Do certain particular perspectives tend to recur?</li>
<li>Does Dialogue closely resemble human conversation because that’s an optimal model, or because there’s a lot of it in the training data and it’s a close-enough approximation of a different, optimal strategy?</li>
</ul>
<p>At a higher level, this work naturally calls to mind the substantial body of existing work on debate and collaboration in humans. The authors directly reference a few of those ideas:</p>
<ul>
<li>There’s considerable evidence that groups with diverse perspectives and expertise often make better decisions, and there’s evidence that individuals can make better decisions by internally simulating those dynamics.</li>
<li>Mercier &amp; Sperber (The Enigma of Reason) have argued that reason evolved for social argumentation more than for individual problem-solving. In particular, they argue that individual cognition is frequently biased and unreliable, but argumentative group deliberations produce good decisions.</li>
<li>Mikhail Bakhtin’s notion of the dialogic self posits that dialogue is fundamental to human cognition.</li>
</ul>
<p>The parallels to human group deliberations are fascinating and suggest all kinds of followup research. I’d be careful about going further than that, though: there isn’t nearly enough evidence to draw any definite conclusions.</p>
<h3>What did the paper definitely not find?</h3>
<p>It would be easy to read too much into this paper and conclude that it shows the presence of multiple cognitive personas. That would be a mistake, as the authors are careful to note:</p>
<blockquote>
<p>Our goal is not to take sides on whether reasoning model traces should be regarded as discourse among simulated human groups or a computational mind’s simulation of such discourse.</p>
</blockquote>
<p>Dialogue is a real phenomenon, and there’s strong evidence that it aids reasoning. But I don’t think we can usefully say much about what’s happening internally. Here are a few of the many possibilities that seem entirely consistent with the available evidence:</p>
<ul>
<li>The models have learned that certain patterns of speech are useful for reasoning, so they repeat those patterns while reasoning. The patterns bear superficial resemblance to conversations between humans with particular personalities.</li>
<li>The models have created non-trivial internal representations of reasoning entities with a range of roles, and those entities are activated during reasoning, having something that genuinely resembles an internal exchange of ideas.</li>
<li>The models create internal hypotheses and strategies, and the process of combining them into a coherent output produces a token stream that resembles a conversation between different entities.</li>
</ul>
<p>This was a really fun paper to dig into—I hope you had as much fun with it as I did, and that it leaves you with lots to think about.</p>
<h2>Appendix: What we didn’t cover</h2>
<p>The full paper is 112 pages long—I simply wasn’t able to cover all of the interesting results and methods. In particular, there’s a lot of good statistical work that strengthens the paper but is outside the scope of this article.</p>
<p>Here I’m going to briefly gesture at a few of the most interesting or important things I didn’t cover in detail—all the details are in the paper if any of them intrigue you.</p>
<h3>1. Core statistical controls</h3>
<p>The authors put a lot of effort into controlling for possible confounders.</p>
<h4>Trace length</h4>
<p>An obvious confounder is trace length: the longer a reasoning trace is, the more chance there is that any given phenomenon will happen in it. That’s a particular problem because reasoning models tend to produce much longer traces. For example, a naive person might observe that reasoning models use the word “the” more often than non-reasoning models, and mistakenly conclude that “the” is a key part of reasoning.</p>
<p>The authors correct for this by using log(trace length) in their regressions. They note that the observed effects occur with and without this correction, which reduces the likelihood of the correction introducing other problems (the “bad control” effect).</p>
<h4>Task fixed effects</h4>
<p>You can imagine all kinds of ways that the nature of a problem (difficulty, domain, whether it requires a multi-step solution) might affect the reasoning trace. The authors address this using a statistical technique that corrects for those differences, essentially focusing on the differences between models on each individual problem.</p>
<h3>2. Mediation analysis</h3>
<p>There are two mediation analyses in the paper, and they’re pretty significant (albeit highly technical). If you already know what a mediation analysis is, you’ll likely find these to be of interest. If you don’t, here’s a very brief description of the technique in general, and what the authors found with it.</p>
<p>Mediation analysis is a technique for teasing apart the causal relationship between three phenomena. For example, you might know that exercise increases endorphin levels and improves mood, and wonder whether exercise improves mood by increasing endorphin levels (aka, endorphin levels “mediate” the mood improvement) or by some other mechanism.</p>
<h4>Why are reasoning models more accurate?</h4>
<p>In Extended Data Fig. 4, the authors use mediation analysis to figure out the mechanism by which reasoning models are more accurate, considering both social behaviors and cognitive behaviors as possible mediators. The core finding is that social behaviors mediate about 20% of the accuracy advantage that reasoning models have over non-reasoning models.</p>
<h4>How does activating Feature 30939 increase accuracy?</h4>
<p>In Fig. 2e, the authors use mediation analysis to figure out the mechanism by which activating Feature 30939 increases accuracy, considering four cognitive behaviors (verification, backtracking, subgoal setting, and backward chaining) as mediators. They conclude that the majority of the effect is direct, but about 23% is mediated by the cognitive behaviors.</p>
<h4>What does the mediation analysis actually show?</h4>
<p>The mediation analysis is well done and significantly strengthens the findings by quantifying a plausible causal pathway between Dialogue and reasoning ability. But it can’t actually prove causation in this case.</p>
<h3>3. How reciprocal are the Dialogues?</h3>
<p>We’ve already discussed the fact that Dialogues from reasoning models have more instances of features like asking and giving, but the authors go further. They calculate something called a Jaccard index to measure how often asking and giving occur together, as a way of measuring how conversation-like a Dialogue is. See Fig. 1c for more details.</p>
]]>
    </content>
  </entry>

  <entry>
    <title>Wearable AI Pins: I’m Skeptical</title>
    <link href="https://againstmoloch.com/writing/2026-01-28_wearableAIPins.html"/>
    <id>https://againstmoloch.com/writing/2026-01-28_wearableAIPins.html</id>
    <updated>2026-01-28T12:00:00Z</updated>
    <summary>AI-focused personal devices are back in the news: OpenAI has announced that they’re working on some new AI-focused devices with Jony Ive and rumor has it that Apple is working on something similar.

I love gadgets and I love AI, and I’m very open to the idea that an AI-first device would look very different from anything that currently exists. But I’m deeply skeptical about the pin form factor.
</summary>
    <content type="html">
      <![CDATA[<figure><img src="https://againstmoloch.com/assets/2026-01-28_wearableAIPins.jpeg" alt="Precision technical illustration of an exploded-view diagram of a small wearable pin device at enlarged scale, with component layers floating in alignment, surrounded by small use-scenario vignettes with dotted sensing lines, some terminated by red X marks"></figure>
<p>AI-focused personal devices are back in the news. Things didn’t go so well last time: a pair of much-anticipated devices from <a href="https://www.youtube.com/watch?v=TitZV6k8zfA">Humane</a> and <a href="https://www.youtube.com/watch?v=ddTV12hErTc">Rabbit</a> were legendarily bad. But maybe this time will be different? OpenAI has announced that they’re working on a new AI-focused device (or family of devices) <a href="https://www.youtube.com/watch?v=rDNyFN_eMec">with Jony Ive</a> and rumor has it that Apple is working on <a href="https://www.theinformation.com/articles/apple-developing-ai-wearable-pin">something similar</a>.</p>
<p>There’s no official information about what the new devices will look like, but it’s widely believed that both companies are working on some kind of wearable pin specifically designed for AI, with a camera and microphone so it’s always aware of your surroundings.</p>
<p>Jony makes a great case for why an AI-focused device might look very different from anything that currently exists:</p>
<blockquote>
<p>The products that we’re using to deliver and connect us to unimaginable technology—they’re decades old. And so it’s just common sense to at least think “surely there’s something beyond these legacy products.”</p>
</blockquote>
<p>That makes a ton of sense on paper, and it seems obvious that both teams are well-positioned to make the best possible version of whatever this thing might be. And yet… I dunno. I just don’t see it. Normally I’m first in line for (<a href="https://www.balmuda.com/lovefrom-balmuda/">almost</a>) anything Jony Ive makes, but I’m having a hard time getting excited about the rumored form factor.</p>
<p>First, I’m really skeptical about always-on cameras and microphones (or even sometimes-on cameras that are always pointed at other people). They enable some cool demos, and they’re somewhat useful, but I’m not sure the benefits are worth the downsides.</p>
<p>And second, regardless of whether or not you like the cameras and microphones, a pin feels like the wrong form factor.</p>
<hr />
<p>Let’s be clear: always-on contextual awareness has genuine use cases, and it makes an amazing demo:</p>
<blockquote>
<p>Hey, robot: when I ran into Bob at the park last summer, what restaurant did he recommend for my trip to London?</p>
</blockquote>
<p>I mean, how cool is that? Who wouldn’t want that kind of magic?</p>
<p>In practice, though, I’m just not sure how often I’d actually use that. I can think of a million uses for an AI I can easily talk to throughout the day, but not that many uses for an AI that can see what I see. Which would be fine if this was just another feature that could be crammed into the cell phone I already carry. But if we’re talking about a new device that I need to buy and upgrade and charge and apparently clip on to my shirt every morning… it needs to be pretty damn useful, and it needs to not cause me a lot of headaches.</p>
<p>Speaking of headaches, here are two things many people hate: being filmed by strangers, and AI. I predict, by the way, that the average person on the street is going to get more angry about AI over the next couple of years, not less. Wearing a prominent device that makes it obvious that my AI is watching and listening to everyone around me? That sounds like a great way to have a lot of unpleasant interactions with angry strangers. (Pro tip: explaining to an angry stranger that actually, your device has a little light right here that would be lit if you were recording them, but it isn’t lit, so everything is cool—that will get you nowhere).</p>
<p>As a user, having a visible AI camera feels like it buys me a lot of aggro. As a brand, being known for putting AI cameras everywhere doesn’t seem like a good move. These days, neither Apple nor OpenAI have brand capital to burn, and this feels like a needless self-inflicted injury.</p>
<p>At first glance, perhaps ambient awareness would make sense at work: there’s less expectation of privacy, especially if everyone wears a company-issued AI, and it’s easy to imagine legitimate use cases. On the other hand: in my experience, corporate legal departments hate the fact that employees keep putting stupid things in email, because anything in email is potentially discoverable in court. It’s hard to imagine legal being excited about the idea of also recording every dumb thing someone says in a hallway conversation.</p>
<hr />
<p>But accepting arguendo that it would be useful to have a new kind of device built specifically for AI and that you might want it to have cameras and microphones, I’m not convinced a wearable pin is the right choice.</p>
<p>If you absolutely must have a camera mounted on your body, a pin attached to your shirt isn’t a great place for it. The pin goes blind every time you put on a sweater, and your hands and laptop screen are always blocking its view, and you frequently turn your head to look at things that aren’t visible to the pin. If you’ve ever watched police bodycam footage, you know exactly what I’m talking about.</p>
<p>You know what doesn’t have any of those problems? Smart glasses.</p>
<p>Audio has similar problems: you could mount a good-enough microphone on a pin, but it’ll never be as clear as the microphones in your earbuds. And how is the device going to talk back to you? Are you walking around with a speakerphone strapped to your chest? That doesn’t sound awesome.</p>
<p>You know what doesn’t have any of those problems? Earbuds.</p>
<p>Earbuds are small and cheap and unobtrusive and they’re the best possible way to have audio interactions. If you want more, Apple is rumored to be working on AirPods with cameras, and at least one company is working on earbuds that can process barely-audible whispers.</p>
<p>If you want additional capability beyond earbuds, smart glasses are a great option. But a shirt-mounted pin? Beyond getting to live your childhood Star Trek fantasies, I’m not sure what that gets you.</p>
<hr />
<p>Everything I’ve just said is profoundly obvious—Apple and OpenAI presumably have a plan that makes sense anyway.</p>
<p>Maybe the rumors are wrong, and the devices in question aren’t wearable pins. Maybe they’ve come up with some kind of clever interaction model where a pin makes sense. Maybe they’ve thought of some uses of ambient awareness that are so great that everyone will need one and we’ll just get used to everyone wearing cameras. Or—just maybe—this direction is a shiny mistake (see: every VR product ever made).</p>
<p>I’m dying to find out.</p>
]]>
    </content>
  </entry>
</feed>