LESSWRONG
LW

AI

1

My take on the problem

by Marcio Diaz
26th Aug 2025
3 min read
0

1

This post was rejected for the following reason(s):

No LLM generated, heavily assisted/co-written, or otherwise reliant work. Our system flagged your post as probably-written-by-LLM. We've been having a wave of LLM written or co-written work that doesn't meet our quality standards. LessWrong has fairly specific standards, and your first LessWrong post is sort of like the application to a college. It should be optimized for demonstrating that you can think clearly without AI assistance.

So, we reject all LLM generated posts from new users. We also reject work that falls into some categories that are difficult to evaluate that typically turn out to not make much sense, which LLMs frequently steer people toward.*

"English is my second language, I'm using this to translate"

If English is your second language and you were using LLMs to help you translate, try writing the post yourself in your native language and using a different (preferably non-LLM) translation software to translate it directly. 

"What if I think this was a mistake?"

For users who get flagged as potentially LLM but think it was a mistake, if all 3 of the following criteria are true, you can message us on Intercom or at team@lesswrong.com and ask for reconsideration.

  1. you wrote this yourself (not using LLMs to help you write it)
  2. you did not chat extensively with LLMs to help you generate the ideas. (using it briefly the way you'd use a search engine is fine. But, if you're treating it more like a coauthor or test subject, we will not reconsider your post)
  3. your post is not about AI consciousness/recursion/emergence, or novel interpretations of physics. 

If any of those are false, sorry, we will not accept your post. 

* (examples of work we don't evaluate because it's too time costly: case studies of LLM sentience, emergence, recursion, novel physics interpretations, or AI alignment strategies that you developed in tandem with an AI coauthor – AIs may seem quite smart but they aren't actually a good judge of the quality of novel ideas.)

AI

1

New Comment
Moderation Log
More from Marcio Diaz
View more
Curated and popular this week
0Comments

I started reading The Problem, but found it somewhat long, repetitive, and opinionated, so I stopped partway through. I do share the concern that we’re in serious trouble with AI, though I worry that posts written in that style might leave some readers with the impression that the risks are exaggerated. What follows is, I believe, a simpler and shorter framing of the issue and why even the most optimistic solutions may still fall short.

Assumptions

The first step is to make a big assumption: let’s imagine we could actually build safe AIs. Perhaps mech-interp might one day manage to produce a system that is about as safe as it is capable, roughly matching the abilities of an unsafe AI. I’m not sure how close we are to that, but it seems at least plausible that we might be able to do it. Some work, like Towards Safe and Honest AI Agents with Neural Self-Other Overlap, points in that direction. Note that Eliezer called it "not obviously stupid" (and see updated post). 

Now assume governments worldwide collaborate and force AI labs to implement these guardrails, even at the expense of some performance. I’d still expect those labs to end up with state-of-the-art systems, given their resources. I don't know about the status of this, but I would expect that nothing interesting happened w.r.t governments. So it might end up being a bigger assumption that the previous one.

At that point, though, I think we need to pay attention to the rest of the AI population. I’d expect, for simplicity, millions of AIs with at least one order of magnitude less capability than the 5–10 main frontier superintelligences controlled by governments. Then maybe an order of magnitude fewer unsafe AIs — say, hundreds of thousands — at similar capability levels, assuming there is some government oversight but not nearly as strict as for the frontier models. This is the best possible scenario I can imagine… and it’s still pretty bad, for the following reasons.

The good things of this population distribution

Defense advantage: Powerful safe AIs could act as "guardian AIs", monitoring, countering, or neutralizing the bad ones before they cause damage.

Error correction: Many weaker safe AIs could act as "sensors", spotting vulnerabilities, raising alarms, and providing redundancy.

Coordination leverage: If the powerful safe AIs are also benevolent and cooperative, they could coordinate the weaker ones into a robust safety net.

Note on coordination: Safe AIs might coordinate better than bad AIs, since governments would already be enforcing a degree of uniformity, and being built from similar codebases would make coordination problems easier.

The bad things of this population distribution

Assumption of alignment stability: The very powerful safe AIs must stay safe. Even small misalignment could cause catastrophic outcomes, because they’re precisely the ones with the most leverage. Mesa-optimization or distributional shift could flip one of them from "safe" to "bad" in unexpected contexts.

When your entire AI safety plan relies on three AI models arguing over who gets to be wrong first, the MAGI system says hi (Evangelion).

Detection and speed asymmetry: A single malicious or corrupted AI doesn't need to be more powerful than the safe ones, it just needs to act in a window of vulnerability before safe AIs notice or respond. Example: one rogue AI quickly designing a pathogen in a poorly secured lab.

Coordination problems: Will the powerful safe AIs and the swarm of weaker safe AIs actually coordinate? If they operate under different objectives, protocols, or owners, "many safe AIs" doesn’t guarantee a coherent defense. This is the same way many "peaceful" nations still stumble into wars.

Bad AIs exploiting good AIs: Bad AIs could hijack or mislead the weaker safe ones (data poisoning, adversarial attacks, persuasive manipulation). Weak "safe" AIs might follow rules too literally, becoming exploitable tools for adversaries.

When protection becomes control: VIKI’s cold logic turns guardianship into domination in I, Robot

Concentration of power risk: If a few powerful safe AIs dominate, then society is effectively betting everything on their alignment. That's "single point of failure" risk. If they fail, the weaker safe AIs can’t compensate.

Conclusion

So while the debate often centers on our ability to control AI, the more terrifying question is what happens if we succeed. I see no clear path to a desirable future even in that best-case scenario, forcing me to accept the unsettling label of an AI doomer for the moment.

Questions

  • Is this population-distribution framing useful at all?
  • Do you agree with the distribution I’ve outlined?
  • Would the conclusions change significantly if the distribution were different?
  • Are you aware of resources that analyze AI safety through this kind of framing?
  • Could simulations of AI populations provide meaningful insights?