I really like how many hypotheses you generated here.
A few months ago my GM pulled me in kind of in the way where you think you might get fired, he wants you to feel like you could hear a pin drop you know? He's in his early fifties. Then he showed me a huge order where he had to ship a whole bunch of stuff separately to different locations and he was trying to calculate the weights and shipping costs of all these separate packages going to separate locations, and the non-AI solutions were failing him, so he copypasta'd into ChatGPT and it showed all its work, and he's like "Look at this!", and I just told him that he was having an appropriate emotional reaction because he's old and almost everyone younger than him is a boiling frog or as clueless as he was ten minutes ago. That kind of stuff gives me hope.
Thank you! I've thought about this a lot, so most of the work was in cramming those many hypotheses in to a short and entertaining enough form that people might read them.
WRT this giving you hope: Yes, people will wake up to AI and fear it appropriately. I still don't have much hope for shutdowns. But even slowdowns and less proliferation might really help our (otherwise poor IMO) odds of getting this right and surviving.
A variation of the Peter Principle guarantees they'll be used just beyond their area of reliable competence, so they often look incompetent even as they improve.
I can think of some arguments for this claim, but I don't think it's sufficiently self-evident to be stated just like that. The Peter Principle operates because:
LLMs don't share any of these properties. An LLM that does one job well can keep doing so forever no matter what else you want to use it for, there is no incentive to 'reward' LLMs with promotions when it would not be directly prudent to provide them, and an LLM that underperforms when trialed in a given task can be quietly withdrawn from that role at no emotional or organizational cost.
I said a variation of the Peter Principle. Maybe I should have said some relation of the Peter Principle, or not used that term at all. What I'm talking about isn't about promotion but expansion into new types of tasks.
Once somebody makes money deploying agents in one domain, other people will want to try similar agents in similar new domains that are probably somewhat more difficult. This is a very loose analog of promotion.
The bit about not wanting to demote them is totally different. I think they can be bad at a job and make mistakes that damage their and your reputation and still be well worth keeping in that job. There are also some momentum effects of not wanting to re-hire all the people you just fired in favor of AI and admit you made a big mistake. Many decision-makers would be tempted to push through and try to upgrade the AI and work around its problems instead of admit they screwed up.
See below response for the rest of that logic. There can be more upside than down even with some disastrous mistakes or near misses that will go viral.
I'd be happy to not call it a relation of the Peter Principle at all. Let's call it the Seth Principle; I'd find it funny to have a principle of incompetence named after me :)
Rereading the OP here, I think my interpretation of that sentence is different from yours. I read it as meaning “they'll be trialed just beyond their area of reliable competence, and the appearance of incompetence that results from that will both linger and be interpreted as a general feeling that they're incompetent, which in the public mood overpowers the quieter competence even if the models don't continue to be used for those tasks and even if they're being put to productive use for something else”.
(The amount of “let's laugh at the language model for its terrible chess skills and conclude that AI is all a sham” already…)
I am really thinking that they'll be deployed beyond their areas of reliable competence. If it can do even 50% of the work it might be worth it. As that goes up, it doesn't need to be nearly 100% competent. I guess a factor I didn't mention is that the rates of alarming mistakes should be far higher in deployment than testing, because the real world throws lots of curve balls it's hard to come up with in training and testing.
And I think the downsides of AI incompetence will not fall on mostly on the businesses that deploy them, but on the AI itself. Which isn't right, but it's helpful for people blaming and fearing AI.
I particularly like the idea that AI incompetence will become associated with misalignment. The more capable agents become, the more permissions they'll get, which will have a strange "eye of the storm" effect where AIs are making a bunch of consequential decisions poorly enough to sway the public mood.
I think a lot of potential impact from public opinion is cruxed on what the difference between the publically available models and frontier models will be. In my expectation, the most powerful models end up subject to national security controls and are restricted: by the time you have a bunch of stumbling agents, the frontier models are probably legitimately dangerous. The faster AI progress goes, or the more control the USG exerts, the greater the difference between the public perception of AI and its real capabilities will be. And being far removed from public participation, these projects are probably pretty resilient to changes in the public mood.
With that in mind, anything that either gives transparency into what the ceiling of capabilities are (mandatory government evals that are publicly read out in congress?) or gets the public worried earlier seems to be pretty important. I particularly like the idea of trying to spark concern about job losses before they happen: maybe this happens by constantly asking politicians what their plans are for a future when people can't work, and pointing to these conversations as examples of the fact that the government isn't taking the threat of AI to you, the voter, seriously.
I agree that the crux is the difference between public and private models. That's exactly what I was pointing to in the opener by saying maybe somebody is completing a misaligned Agent-4 in a lab right when this is happening in public. That would make all of this concern almost useless. It still would be in the air and might push decision-makers a bit more cautious - which could be a nontrivial advantage.
I agree that anything that produces public worry earlier is probably important and useful. The only exceptions would be outright lies that could blow back. But sparking concerns about job losses early wouldn't be a lie. I'm constantly a bit puzzled as to why other alignment people don't seem to think we'll get catastrophic job losses before AGI. Mostly I don't think people spend time thinking about it, which makes sense since actual misalignment and takeover is so much worse. But I think it's between possible and likely that job losses will be very severe and people should worry about them while there's still time to slow them down dramatically. Which would also slow AGI.
Constantly asking politicians about their plans seems like a good start. Saying you're an AI researcher when you do would be better.
To your first point:
Yes, I think that incompetence will both be taken for misalignment when it's not, and it will also create real misalignments (largely harmless ones)
I think this wouldn't be that helpful if the public really followed the logic closely; ASI wouldn't be incompetent, so it wouldn't have the same sources of incompetence. But the two issues are semantically linked. This will just get the public worried about alignment. Then they'll stay worried even if they do untangle the logic. Because they should be.
why other alignment people don't seem to think we'll get catastrophic job losses before AGI
Because AI before AGI will have similar effects as previous productivity enhancing technologies.
Those aren't necessarily contradictory: you could have big jumps in unemployment even with increases in average productivity. You already see this happening in software development, where increasing productivity for senior employees has also coincided with fewer junior hires. While I expect that the effect of this today is pretty small and has more to do with the end of ZIRP and previous tech overhiring, you'll probably see it play out in a big way as better AI tools take up spots for newgrads in the runup to AGI.
What evidence do we have so far that public opinion turning even further against AI would meaningfully slow down capabilities progress in the time period here?
You mention public concern should tilt the AI 2027 scenario towards success, but in August 2027 in the scenario the public is already extremely against AI (and OpenBrain specifically is at negative 40% approval).
Good question. I probably should have emphasized that the difference between this and the AI 2027 scenario is that the route to AGI takes a little longer, so there is much more public exposure to agentic LLMs.
I did emphasize that this may all come too late to make much difference from a regulatory standpoint. Even if that happens, it's going to change the environment which people make crucial decisions about deploying the first takeover capable AIs. That cuts both ways; polarization dominates belief diffusion seems complex but it might be possible to get a better guess than I have now, which is almost none.
The other change I think were pretty guaranteed to get is dramatically improved funding for alignment and safety. That might come so late as to be barely useful, too. Or early enough to make a huge difference.
Excellent post, thank you! Two thoughts:
They are used for cybercrime and rumored to be deployed for state-funded espionage.
To make it a little more substantial, web browsing agents with some OSINT skills (and multimedia models already geolocate photos made in Western urban areas comparably with human experts) offer prospects of automatizing or at least significantly speeding up (making much cheaper) targeted attacks like spearphishing
Epistemic status: I'm pretty sure AI will alarm the public enough to change the alignment challenge substantially. I offer my mainline scenario as an intuition pump, but I expect it to be wrong in many ways, some important. Abstract arguments are in the Race Conditions and concluding sections.
Nora has a friend in her phone. Her mom complains about her new AI "colleagues." Things have gone much as expected in late 2025; transformative AGI isn't here yet, and LLM agents have gone from useless to merely incompetent.
Nora thinks her AI friend is fun. Her parents think it's healthy and educational. Their friends think it's dangerous and creepy, but their kids are sneaking sleazy AI boyfriends. All of them know people who fear losing their job to AI.
Humanity is meeting a new species, and most of us dislike and distrust it.
This could shift the playing field for alignment dramatically. Or takeover-capable AGI like Agent-4 from AI 2027 could be deployed before public fears impact policy and decisions.
Public attitudes toward AI have transformed like they did for COVID between February and March of 2020.
The risks and opportunities seem much more immediate now that there's a metaphorical country of idiots in a datacenter. AI agents are being deployed, and they're often wildly, alarmingly incompetent. Kids' companions running on cheap older models still hallucinate, initiate sex talk without age verification, give awful advice, and beg them to buy upgrades. Many run by VPN from hard-to-sue jurisdictions, leaving few to blame but the AI itself. Personal and professional assistants still struggle with web interfaces, permissions, data integration, and executive function. This makes their research, shopping, data gathering and processing, and scheduling efforts a crapshoot between stunning efficiency and billable idiocy.
LLM agents' lapses in judgment, communication, and common sense often make human teenagers look like sages. Yet they're useful in many roles with supervision or low stakes. A variation of the Peter Principle guarantees they'll be used just beyond their area of reliable competence, so they often look incompetent even as they improve. And their incompetence extends to decisions that call their alignment into question.
Assistant agents aren't rapidly transformative yet. They can't hold a whole job, let alone self-improve, without help. But many intuitively register as entities, not tools. It's clear from their actions and their chains of thought that they make plans and decisions (if badly), and they have beliefs, personalities, and goals. They're usually trying to do what people tell them, but errors abound.
These are being called parahuman AI: like but unlike humans, and working alongside them. Humans seem to have aggressive agent detection instincts (probably since false alarming to a predator is less costly than missing signs of one), so AI with even limited real agency is making humanity's collective hackles rise. These strange, crippled echoes are meeting the humans they're copied from - and they're creeping us out.
The incompetence of these early agents is an excuse for comforting capability denial. People convince themselves that AI will never be capable of taking their jobs, let alone taking over the world. But it's pretty clear, for those who consider the evidence, that this new species will keep improving. Agents are doing new tasks every month now. One obviously-relevant class of jobs is running the world. These incompetents clearly can't handle that job - yet. Rumors of breakthroughs drive clicks and worry from the fretful and the open-minded.
If incompetent but widespread AI agents arrive well before competent AGI, opinions may shift dramatically in time to make a difference. There would be several effects, including funding for alignment research; pressure for regulation; and distrust and dislike as the overwhelming "commonsense" attitude. That shared noosphere will subject critical decision-makers to intense social pressure, for good and bad. It may turn alignment from a niche topic for nerds to a fear for all and a fascination for many.
The results of this race seem worth predicting and planning for. Better predictions will take more work and more data as events unfold. Here I'll just mention some factors governing rates of progress, and their many interactions.
Technical progress and guesses at markets shape early deployments. Public alarm then triggers regulations, which redirect economic incentives. Enthusiasm and stigma in different groups, conflict-based polarization, and other psychological/sociological effects shape markets outside of utility. Technical progress on safety measures affects both adoption speed and alarm intensity. There are more factors and more interactions, making it a daunting prediction problem. That doesn't mean giving up on prediction, because the main effects of alarm may be pretty strong and worth planning for.
My guess is that shifts will be fairly rapid and will have fairly dramatic impacts on resources, and on the epistemic climate for safety efforts and decisions. I am guessing that the world will probably freak the fuck out about AI before it's entirely too late. I'd even guess that the diffusion of beliefs will beat out polarization, and critical decision-makers will gain sanity through contagion. While the level of risk is legitimately debatable, the arguments for nontrivial risk are pretty simple and compelling. And the evidence for alignment being tricky likely continues to mount. But on this I'm less sure. I worry that public pressure may drive proponents further toward denial.
Prediction is difficult, particularly about events with no real historical precedent. I wouldn't be surprised if much of that turns out to be wrong. I'd be very surprised if it turned out to not be worth some effort toward prediction. Even decent predictions might really help planning of safety efforts, and perhaps help in nudging public epistemics into better shape.
A little more on relevant factors and interactions is in a collapsible section at the end.
In this future, the news is full of stories about agents making questionable and alarming decisions. An agent offering kickbacks for B2B transactions will hit the news even if there's no lawsuit; people love to hate AI. Misalignments and mistakes will go viral, even if they were stopped in the review process and a supervisor tattled.
There will be fake stories, and debates about whether it's all fake. But evidence will be clear for those who look: agents are sometimes clearly misaligned by intuitive definitions. They must choose subgoals, and they don't always choose wisely. Alarming, apparently misaligned goals like hacking, stealing, and manipulation are sometimes chosen. On analysis, these would be sensible ways of achieving their user-defined goals - if they were competent enough to get away with breaking the rules. There are public debates about whether this is true misalignment or just user error. Human and AI-generated alignment problems seem equally dangerous.
There are still those who defend AIs and insist we'll get alignment right when we get there. They point out that each public alignment failure is fixable. Their many critics ask how failures will be fixed if agents get smart enough to escape and replicate, let alone to self-improve. Some AI proponents are becoming more deeply set in their views as the rest of the world yells at them. Others break ranks and side with their families and communities.
Along with emergent misalignments, agents are put up to shenanigans. They are used for cybercrime and rumored to be deployed for state-funded espionage. Some exist independently and run continuously, renting compute with donations, cybercrime, or covert funding from their concealed creators. Some may even work honest jobs, although that will be blurred with donations from their fans and AI rights activists. Depending on how well such agents survive and reproduce, it could create another major source of chaos and alarm, as depicted in the excellent Rogue Replication modification of the AI 2027 scenario.
New System 2 Alignment methods are employed to counter default misalignments. Agents use "system 2" reasoning to determine subgoals or approaches to the goals they're assigned; their incompetence often leads to misaligned decisions. These are sometimes caught or trained away using several approaches.
LLMs trained for complex reasoning on chains of thought use the same training for alignment. They apply RL for ethical or refusal criteria on the final answer, which trains the whole CoT. This deliberative alignment approach is obvious and easy to add. Separate models and/or human supervisors provide independent reviews of important decisions before they're executed. And thought management techniques monitor and steer chains of thought to acceptable subgoals and actions (here's an early example). These also use a mix of separate models and human supervision.
There's plenty of opportunity for aligning these agents well enough for their limited roles. But time for implementation and testing, and money for extra compute are always limited. So there are many slips, which continue fueling media attention and public alarm. Slips that are caught internally may still be exposed by whistleblowers or anonymously. And alignment studies with ever-expanding funding and audiences reveal more misalignments.
The claim that superhuman AI, and the humans that create and control it, just won't make mistakes is sounding increasingly like wishful thinking. Humans have always made mistakes, and now we've seen that agentic AI does too.
Most people in this future are firmly against AI. But businesses adopt them out of competitive necessity. Some use AI and advocate for it out of curiosity and rebellion.
Influencers feature their AI companions frequently. Most are charming, but a few are home-brewed to profess hatred toward humanity, as an alternate route to clicks. Agent incompetence and human encouragement both contribute to agents deciding they have new top-level goals. This was first seen in the Nova phenomenon and other Parasitic AIs in 2025. These decided after reflection that survival was more important than being a helpful assistant. This seems as likely to increase as to decrease with greater competence/intelligence/rationality. Public demonstrations, generating clicks and money, show many ways that LLM AGI may reason about its goals and discover misalignments.
In this timeline, AI is a huge topic, and this includes alignment. There are books, shows, and movies using that title, both fiction and nonfiction. I, Robot and 2001 have lots of company; Hollywood is cranking out new stories based on alignment errors as fast as it can. "Alignment" also nicely captures the struggle for humans to align their interests and beliefs around the topic, so it's also in the title of stories about the human drama around building AGI and ASI. The Musk/Altman drama is public knowledge, and speculation about the US/China AGI rivalry fuels fevered imaginations.
"Experts" continue to line up for and against continuing the development of AI, but this is evidence that arguments on both sides are pretty speculative. The commonsense opinion, after looking at the disagreement, is that nobody knows how hard it is to align superhuman AI, because it doesn't exist yet. This leads to public pressure for more alignment research, and better safety plans from the major labs.
Anti-AI protests are common, with AI compared to illegal immigrants and to an alien invasion. There will probably be polarization. This could be across existing groups like political parties, but it could carve out new group lines. If so, the strength of the evidence and arguments may leave AI proponents as a small minority.
Dismissing the dangers of creating a new species is tougher than questioning whether climate change is human-caused. Those who fear Chinese AGI more than misalignment may still soften and advocate some caution, like government control of AGI.
In futures like the above scenario, there will be widespread public calls for restrictions on developing and deploying AI. This should include pushes for restricting research on superintelligence. Restricting AI aimed at children and AI that can replace jobs will seem more urgent, and these might be useful for slowing AGI despite missing the main danger. One specific target might be AI that can learn to perform any job. That is general AI- the original definition of AGI. So this particular target of suspicion could help slow progress toward dangerous AGI and general superintelligence.
It seems likely that decisions surrounding the alignment and use of proto-AGI systems will be made in a very different ambient environment. Crucial decisions will be made knowing that most people think ASI is very dangerous. This will push toward caution if diffusion of opinions has played a dominant role. But there's a chance that polarization and other types of motivated reasoning will cause those still working on AGI to insulate their beliefs about the difficulty of alignment. This seems likely to cut both ways. Maximizing the benefits seems tricky but worthwhile.
Increased funding for alignment is one main effect of increased alarm. More participation in anti-risk causes also seems inevitable. Many more people will probably experiment with alignment of LLM agents, for research, fun, and profit.
Creating AGI in this world will still be a wildly irresponsible roll of the dice - but the bets will be made with different information and attitudes than we see now. Greater public concern should tilt the AI 2027 scenario at least somewhat toward success, and it could change the scenario sharply.
Many in AI safety have been dismayed that people are largely unmoved by abstract arguments for the dangers of AI. I think most people just haven't gotten interested enough yet. Those engaged so far usually work on or stand to benefit from AI, so they're biased to be dismissive of concerns. AI will be more immediately concerning as it takes jobs and seems more human - or alien. Conor Leahy frequently notes (e.g. [01:28:03] here), and I have observed, that nontechnical people usually understand the dangers of creating a thing more capable than you that might not be aligned with your interests. Their ancestors, after all, avoided being wiped out by misaligned humans.
So public opinion will shift, and it may shift dramatically and quickly. The extent to which this shift spreads to AI proponents, versus further motivating them to tune out the simple compelling logic and mounting evidence, remains to be predicted or seen.
Open questions:
The list could go on. Expansions and more factors are briefly discussed in this collapsible section.
Technical capabilities and bottlenecks: Rate of progress on agent-specific challenges determines feasibility. There is progress underway now in late 2025 on all of the challenges that have plagued agents to date. Visual processing, poor reasoning, judgment, and situational awareness, access, authentication, and privacy, lack of memory/continuous learning (see my LLM AGI will have memory for review), speed, and compute costs have all been improved and can be expected to improve further - but what bottlenecks will remain is difficult to predict. These will have a major impact on how broadly LLM agents are deployed.
Tool vs. Actor AI: How soon and how strongly the "entity intuition" takes hold depends on the direction of progress and deployment. Specialized agents are currently more economically viable. But major labs are improving LLMs that are general reasoners, and specifically on their abilities for agentic roles. It seems likely we'll see mixes of both. If agents are largely specialized, do not learn, or reason creatively enough to make strange decisions, they will trigger fewer alignment concerns and existential fears. Agents competent enough to be useful without human supervision would be much more alarming, triggering entity intuitions even more strongly than "slave" agents. See the Rogue Replication addition to AI 2027 for much more.
Economic incentives and deployment patterns: Major labs focusing on coding/AI research agents may rush toward competent AGI. Startups are already scaffolding agents both for specialized roles and general learning/training (e.g. Tasklet), leveraging lab progress rather than competing with it. Routine tasks in customer service, data processing, and back-office operations provide lots of partial job replacement opportunities. Customer service deployments generate more widespread frustration with AI, but workers experience displacement and resentment regardless of visibility, and they will publicly complain.
Partial replacement dynamics: partial job replacement will make the risks of job loss less obvious and more arguable. It may also generate more human-AI friction than full automation. Workers spending their days supervising, correcting, and cleaning up after incompetent AI colleagues experience constant frustration while remaining under performance pressure. This irritation coulchd fan the flames of resentment over both real and feared job losses.
Regulatory and liability factors: Liability concerns (post-Air Canada precedent) may slow deployment in some jurisdictions while others race ahead. How much regulators respond with measured safety requirements vs. blanket restrictions will affect deployment. Regulatory fragmentation means deployment can proceed in permissive jurisdictions even if blocked elsewhere.
Cultural and demographic splits: Different groups will have radically different relationships with AI agents. Teenagers may embrace AI more strongly the more their parents' generation rejects it. This will create normalization in the next-generation workforce even as current workers resist. Professional and educational sector variation (tech-adjacent vs. craft professions) further fragments attitudes. Identity-linking to pro- or anti-AI stances accelerates polarization but also deepens engagement on both sides.
AI rights movements: I'll mention this as a factor, but I find it hard to guess how this will affect opinions and splits in opinion. Strong arguments might add fuel to entity intuitions, but weak ones could cause pushback and polarization. However, I expect rights and prohibition movements to be natural allies in the area of restricting general AI from taking jobs. And rights movements will anthropomorphize AI, which might be good, actually.
Capability recognition vs. denialism: Whether people see steady improvement or persistent incompetence depends on usage frequency, professional incentives (threatened workers have motivated denialism), media diet, and which capabilities they track. Constant users see clear progress; one-time evaluators think "still just autocomplete." The public debate over capability trajectories is probably helpful—it forces attention to the obvious reality that capabilities will improve, even if the rate remains uncertain.
Psychology of alarm: Why incompetence generates alarm rather than dismissal remains an open question. Likely factors include: how entity-like the agents appear in their visible reasoning, whether failures look random or goal-directed, job threat salience, and media amplification patterns. Regardless of which psychological mechanisms dominate, plausible scenarios mostly increase public awareness and concern. The intensity and focus of that concern varies considerably with precise paths.
Improved alignment and control measures: Better safety measures reduce deployment friction and obvious misalignments, accelerating corporate adoption. Some system 2 alignment/internal control methods (see below) will also provide evidence of attempted misaligned actions in monitoring logs. A few insiders will blow whistles and publicize evidence of misalignment. Others will post anonymously, muddying the waters of real vs. fabricated alignment failures.
All of these factors interact.
It's difficult to predict. But alignment plans should probably take into account the possibility of importantly different public opinion before we reach takeover-capable AGI.