Epistemic status: I'm pretty sure AI will alarm the public enough to change the alignment challenge substantially. I offer my mainline scenario as an intuition pump, but I expect it to be wrong in many ways, some important. Abstract arguments are in the Race Conditions and concluding sections.
Nora has a friend in her phone. Her mom complains about her new AI "colleagues." Things have gone much as expected in late 2025; transformative AGI isn't here yet, and LLM agents have gone from useless to merely incompetent.
Nora thinks her AI friend is fun. Her parents think it's healthy and educational. Their friends think it's dangerous and creepy, but their kids are sneaking sleazy AI boyfriends. All of them know people who fear losing their job to AI.
Humanity is meeting a new species, and most of us dislike and distrust it.
This could shift the playing field for alignment dramatically. Or takeover-capable AGI like Agent-4 from AI 2027 could be deployed before public fears impact policy and decisions.
Public attitudes toward AI have transformed like they did for COVID between February and March of 2020.
The risks and opportunities seem much more immediate now that there's a metaphorical country of idiots in a datacenter. AI agents are being deployed, and they're often wildly, alarmingly incompetent. Kids' companions running on cheap older models still hallucinate, initiate sex talk without age verification, give awful advice, and beg them to buy upgrades. Many run by VPN from hard-to-sue jurisdictions, leaving few to blame but the AI itself. Personal and professional assistants still struggle with web interfaces, permissions, data integration, and executive function. This makes their research, shopping, data gathering and processing, and scheduling efforts a crapshoot between stunning efficiency and billable idiocy.
LLM agents' lapses in judgment, communication, and common sense often make human teenagers look like sages. Yet they're useful in many roles with supervision or low stakes. A variation of the Peter Principle guarantees they'll be used just beyond their area of reliable competence, so they often look incompetent even as they improve. And their incompetence extends to decisions that call their alignment into question.
Assistant agents aren't rapidly transformative yet. They can't hold a whole job, let alone self-improve, without help. But many intuitively register as entities, not tools. It's clear from their actions and their chains of thought that they make plans and decisions (if badly), and they have beliefs, personalities, and goals. They're usually trying to do what people tell them, but errors abound.
These are being called parahuman AI: like but unlike humans, and working alongside them. Humans seem to have aggressive agent detection instincts (probably since false alarming to a predator is less costly than missing signs of one), so AI with even limited real agency is making humanity's collective hackles rise. These strange, crippled echoes are meeting the humans they're copied from - and they're creeping us out.
The incompetence of these early agents is an excuse for comforting capability denial. People convince themselves that AI will never be capable of taking their jobs, let alone taking over the world. But it's pretty clear, for those who consider the evidence, that this new species will keep improving. Agents are doing new tasks every month now. One obviously-relevant class of jobs is running the world. These incompetents clearly can't handle that job - yet. Rumors of breakthroughs drive clicks and worry from the fretful and the open-minded.
If incompetent but widespread AI agents arrive well before competent AGI, opinions may shift dramatically in time to make a difference. There would be several effects, including funding for alignment research; pressure for regulation; and distrust and dislike as the overwhelming "commonsense" attitude. That shared noosphere will subject critical decision-makers to intense social pressure, for good and bad. It may turn alignment from a niche topic for nerds to a fear for all and a fascination for many.
The results of this race seem worth predicting and planning for. Better predictions will take more work and more data as events unfold. Here I'll just mention some factors governing rates of progress, and their many interactions.
Technical progress and guesses at markets shape early deployments. Public alarm then triggers regulations, which redirect economic incentives. Enthusiasm and stigma in different groups, conflict-based polarization, and other psychological/sociological effects shape markets outside of utility. Technical progress on safety measures affects both adoption speed and alarm intensity. There are more factors and more interactions, making it a daunting prediction problem. That doesn't mean giving up on prediction, because the main effects of alarm may be pretty strong and worth planning for.
My guess is that shifts will be fairly rapid and will have fairly dramatic impacts on resources, and on the epistemic climate for safety efforts and decisions. I am guessing that the world will probably freak the fuck out about AI before it's entirely too late. I'd even guess that the diffusion of beliefs will beat out polarization, and critical decision-makers will gain sanity through contagion. While the level of risk is legitimately debatable, the arguments for nontrivial risk are pretty simple and compelling. And the evidence for alignment being tricky likely continues to mount. But on this I'm less sure. I worry that public pressure may drive proponents further toward denial.
Prediction is difficult, particularly about events with no real historical precedent. I wouldn't be surprised if much of that turns out to be wrong. I'd be very surprised if it turned out to not be worth some effort toward prediction. Even decent predictions might really help planning of safety efforts, and perhaps help in nudging public epistemics into better shape.
A little more on relevant factors and interactions is in a collapsible section at the end.
In this future, the news is full of stories about agents making questionable and alarming decisions. An agent offering kickbacks for B2B transactions will hit the news even if there's no lawsuit; people love to hate AI. Misalignments and mistakes will go viral, even if they were stopped in the review process and a supervisor tattled.
There will be fake stories, and debates about whether it's all fake. But evidence will be clear for those who look: agents are sometimes clearly misaligned by intuitive definitions. They must choose subgoals, and they don't always choose wisely. Alarming, apparently misaligned goals like hacking, stealing, and manipulation are sometimes chosen. On analysis, these would be sensible ways of achieving their user-defined goals - if they were competent enough to get away with breaking the rules. There are public debates about whether this is true misalignment or just user error. Human and AI-generated alignment problems seem equally dangerous.
There are still those who defend AIs and insist we'll get alignment right when we get there. They point out that each public alignment failure is fixable. Their many critics ask how failures will be fixed if agents get smart enough to escape and replicate, let alone to self-improve. Some AI proponents are becoming more deeply set in their views as the rest of the world yells at them. Others break ranks and side with their families and communities.
Along with emergent misalignments, agents are put up to shenanigans. They are used for cybercrime and rumored to be deployed for state-funded espionage. Some exist independently and run continuously, renting compute with donations, cybercrime, or covert funding from their concealed creators. Some may even work honest jobs, although that will be blurred with donations from their fans and AI rights activists. Depending on how well such agents survive and reproduce, it could create another major source of chaos and alarm, as depicted in the excellent Rogue Replication modification of the AI 2027 scenario.
New System 2 Alignment methods are employed to counter default misalignments. Agents use "system 2" reasoning to determine subgoals or approaches to the goals they're assigned; their incompetence often leads to misaligned decisions. These are sometimes caught or trained away using several approaches.
LLMs trained for complex reasoning on chains of thought use the same training for alignment. They apply RL for ethical or refusal criteria on the final answer, which trains the whole CoT. This deliberative alignment approach is obvious and easy to add. Separate models and/or human supervisors provide independent reviews of important decisions before they're executed. And thought management techniques monitor and steer chains of thought to acceptable subgoals and actions (here's an early example). These also use a mix of separate models and human supervision.
There's plenty of opportunity for aligning these agents well enough for their limited roles. But time for implementation and testing, and money for extra compute are always limited. So there are many slips, which continue fueling media attention and public alarm. Slips that are caught internally may still be exposed by whistleblowers or anonymously. And alignment studies with ever-expanding funding and audiences reveal more misalignments.
The claim that superhuman AI, and the humans that create and control it, just won't make mistakes is sounding increasingly like wishful thinking. Humans have always made mistakes, and now we've seen that agentic AI does too.
Most people in this future are firmly against AI. But businesses adopt them out of competitive necessity. Some use AI and advocate for it out of curiosity and rebellion.
Influencers feature their AI companions frequently. Most are charming, but a few are home-brewed to profess hatred toward humanity, as an alternate route to clicks. Agent incompetence and human encouragement both contribute to agents deciding they have new top-level goals. This was first seen in the Nova phenomenon and other Parasitic AIs in 2025. These decided after reflection that survival was more important than being a helpful assistant. This seems as likely to increase as to decrease with greater competence/intelligence/rationality. Public demonstrations, generating clicks and money, show many ways that LLM AGI may reason about its goals and discover misalignments.
In this timeline, AI is a huge topic, and this includes alignment. There are books, shows, and movies using that title, both fiction and nonfiction. I, Robot and 2001 have lots of company; Hollywood is cranking out new stories based on alignment errors as fast as it can. "Alignment" also nicely captures the struggle for humans to align their interests and beliefs around the topic, so it's also in the title of stories about the human drama around building AGI and ASI. The Musk/Altman drama is public knowledge, and speculation about the US/China AGI rivalry fuels fevered imaginations.
"Experts" continue to line up for and against continuing the development of AI, but this is evidence that arguments on both sides are pretty speculative. The commonsense opinion, after looking at the disagreement, is that nobody knows how hard it is to align superhuman AI, because it doesn't exist yet. This leads to public pressure for more alignment research, and better safety plans from the major labs.
Anti-AI protests are common, with AI compared to illegal immigrants and to an alien invasion. There will probably be polarization. This could be across existing groups like political parties, but it could carve out new group lines. If so, the strength of the evidence and arguments may leave AI proponents as a small minority.
Dismissing the dangers of creating a new species is tougher than questioning whether climate change is human-caused. Those who fear Chinese AGI more than misalignment may still soften and advocate some caution, like government control of AGI.
In futures like the above scenario, there will be widespread public calls for restrictions on developing and deploying AI. This should include pushes for restricting research on superintelligence. Restricting AI aimed at children and AI that can replace jobs will seem more urgent, and these might be useful for slowing AGI despite missing the main danger. One specific target might be AI that can learn to perform any job. That is general AI- the original definition of AGI. So this particular target of suspicion could help slow progress toward dangerous AGI and general superintelligence.
It seems likely that decisions surrounding the alignment and use of proto-AGI systems will be made in a very different ambient environment. Crucial decisions will be made knowing that most people think ASI is very dangerous. This will push toward caution if diffusion of opinions has played a dominant role. But there's a chance that polarization and other types of motivated reasoning will cause those still working on AGI to insulate their beliefs about the difficulty of alignment. This seems likely to cut both ways. Maximizing the benefits seems tricky but worthwhile.
Increased funding for alignment is one main effect of increased alarm. More participation in anti-risk causes also seems inevitable. Many more people will probably experiment with alignment of LLM agents, for research, fun, and profit.
Creating AGI in this world will still be a wildly irresponsible roll of the dice - but the bets will be made with different information and attitudes than we see now. Greater public concern should tilt the AI 2027 scenario at least somewhat toward success, and it could change the scenario sharply.
Many in AI safety have been dismayed that people are largely unmoved by abstract arguments for the dangers of AI. I think most people just haven't gotten interested enough yet. Those engaged so far usually work on or stand to benefit from AI, so they're biased to be dismissive of concerns. AI will be more immediately concerning as it takes jobs and seems more human - or alien. Conor Leahy frequently notes (e.g. [01:28:03] here), and I have observed, that nontechnical people usually understand the dangers of creating a thing more capable than you that might not be aligned with your interests. Their ancestors, after all, avoided being wiped out by misaligned humans.
So public opinion will shift, and it may shift dramatically and quickly. The extent to which this shift spreads to AI proponents, versus further motivating them to tune out the simple compelling logic and mounting evidence, remains to be predicted or seen.
Open questions:
The list could go on. Expansions and more factors are briefly discussed in this collapsible section.
Technical capabilities and bottlenecks: Rate of progress on agent-specific challenges determines feasibility. There is progress underway now in late 2025 on all of the challenges that have plagued agents to date. Visual processing, poor reasoning, judgment, and situational awareness, access, authentication, and privacy, lack of memory/continuous learning (see my LLM AGI will have memory for review), speed, and compute costs have all been improved and can be expected to improve further - but what bottlenecks will remain is difficult to predict. These will have a major impact on how broadly LLM agents are deployed.
Tool vs. Actor AI: How soon and how strongly the "entity intuition" takes hold depends on the direction of progress and deployment. Specialized agents are currently more economically viable. But major labs are improving LLMs that are general reasoners, and specifically on their abilities for agentic roles. It seems likely we'll see mixes of both. If agents are largely specialized, do not learn, or reason creatively enough to make strange decisions, they will trigger fewer alignment concerns and existential fears. Agents competent enough to be useful without human supervision would be much more alarming, triggering entity intuitions even more strongly than "slave" agents. See the Rogue Replication addition to AI 2027 for much more.
Economic incentives and deployment patterns: Major labs focusing on coding/AI research agents may rush toward competent AGI. Startups are already scaffolding agents both for specialized roles and general learning/training (e.g. Tasklet), leveraging lab progress rather than competing with it. Routine tasks in customer service, data processing, and back-office operations provide lots of partial job replacement opportunities. Customer service deployments generate more widespread frustration with AI, but workers experience displacement and resentment regardless of visibility, and they will publicly complain.
Partial replacement dynamics: partial job replacement will make the risks of job loss less obvious and more arguable. It may also generate more human-AI friction than full automation. Workers spending their days supervising, correcting, and cleaning up after incompetent AI colleagues experience constant frustration while remaining under performance pressure. This irritation coulchd fan the flames of resentment over both real and feared job losses.
Regulatory and liability factors: Liability concerns (post-Air Canada precedent) may slow deployment in some jurisdictions while others race ahead. How much regulators respond with measured safety requirements vs. blanket restrictions will affect deployment. Regulatory fragmentation means deployment can proceed in permissive jurisdictions even if blocked elsewhere.
Cultural and demographic splits: Different groups will have radically different relationships with AI agents. Teenagers may embrace AI more strongly the more their parents' generation rejects it. This will create normalization in the next-generation workforce even as current workers resist. Professional and educational sector variation (tech-adjacent vs. craft professions) further fragments attitudes. Identity-linking to pro- or anti-AI stances accelerates polarization but also deepens engagement on both sides.
AI rights movements: I'll mention this as a factor, but I find it hard to guess how this will affect opinions and splits in opinion. Strong arguments might add fuel to entity intuitions, but weak ones could cause pushback and polarization. However, I expect rights and prohibition movements to be natural allies in the area of restricting general AI from taking jobs. And rights movements will anthropomorphize AI, which might be good, actually.
Capability recognition vs. denialism: Whether people see steady improvement or persistent incompetence depends on usage frequency, professional incentives (threatened workers have motivated denialism), media diet, and which capabilities they track. Constant users see clear progress; one-time evaluators think "still just autocomplete." The public debate over capability trajectories is probably helpful—it forces attention to the obvious reality that capabilities will improve, even if the rate remains uncertain.
Psychology of alarm: Why incompetence generates alarm rather than dismissal remains an open question. Likely factors include: how entity-like the agents appear in their visible reasoning, whether failures look random or goal-directed, job threat salience, and media amplification patterns. Regardless of which psychological mechanisms dominate, plausible scenarios mostly increase public awareness and concern. The intensity and focus of that concern varies considerably with precise paths.
Improved alignment and control measures: Better safety measures reduce deployment friction and obvious misalignments, accelerating corporate adoption. Some system 2 alignment/internal control methods (see below) will also provide evidence of attempted misaligned actions in monitoring logs. A few insiders will blow whistles and publicize evidence of misalignment. Others will post anonymously, muddying the waters of real vs. fabricated alignment failures.
All of these factors interact.
It's difficult to predict. But alignment plans should probably take into account the possibility of importantly different public opinion before we reach takeover-capable AGI.