LESSWRONG
LW

peterbarnett
2923Ω6819970
Message
Dialogue
Subscribe

Researcher at MIRI

https://peterbarnett.org/

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
My AI Risk Model
3peterbarnett's Shortform
3y
39
A case for courage, when speaking of AI danger
peterbarnett7d3613

Maybe it’s hard to communicate nuance, but it seems like there's a crazy thing going on where many people in the AI x-risk community think something like “Well obviously I wish it would stop, and the current situation does seem crazy and unacceptable by any normal standards of risk management. But there’s a lot of nuance in what I actually think we should do, and I don’t want to advocate for a harmful stop.”

And these people end up communicating to external people something like “Stopping is a naive strategy, and continuing (maybe with some safeguards etc) is my preferred strategy  for now.”

This seems to miss out the really important part where they would actually want to stop if we could, but it seems hard and difficult/nuanced to get right.

Reply
Curing PMS with Hair Loss Pills
peterbarnett13d40

Is there a side-effect of unwanted hair growth? 

Reply1
AI Task Length Horizons in Offensive Cybersecurity
peterbarnett13d20

They're in the original blog post: https://sean-peters-au.github.io/2025/07/02/ai-task-length-horizons-in-offensive-cybersecurity.html
But it would be good to update this LW post

Reply
The best simple argument for Pausing AI?
peterbarnett15d3016

Here's my shot at a simple argument for pausing AI. 

We might soon hit a point of no return and the world is not at all ready. 

A central point of no return is if we kick off a recursive automated AI R&D feedback loop (i.e., an intelligence explosion), where the AI systems get smarter and more capable, and humans are totally unable to keep up. I can imagine humans nominally still being in the loop but not actually understanding things, or being totally reliant on AIs explaining dumbed down versions of the new AI techniques being discovered. 

There are other points of no return that are less discrete, such as if states become economically or militarily reliant on AI systems. Maybe due to competitive dynamics with other states, or just because the AIs are so damn useful and it would be too inconvenient to remove them from all the societal systems they are now a part of. See "The date of AI Takeover is not the day the AI takes over" for related discussion. 

If we hit a point of no return and develop advanced AI (including superintelligent AI), this will come with a whole range of problems that the world is not ready for. I think any of these would be reasonable grounds for pausing until we can deal with them.[1]

  • Misalignment: We haven't solved alignment, and it seems like by default we won't. The majority of techniques for making AIs safer today will not scale to superintelligence. I think this makes Loss of Control a likely outcome (as in humans lose control over the entire future and almost all value is lost).
  • War and geopolitical destabilization: Advanced AI or the technologies it enables are politically destabilizing, such as removing states' second-strike nuclear capabilities. States may go to war or perform preemptive strikes to avoid this.
  • Catastrophic misuse: Malicious actors or rogue states may gain access to AI (e.g., by stealing model weights, training the AI themselves, or using an open weights model), and use it to cause catastrophic harm. Current AIs are not yet at this level, but future AIs will likely be.
  • Authoritarianism and bad lock-in: AI could lead to unprecedented concentration of power, it might enable coups to be performed with relatively little support from human actors, and then entrench this concentrated power.
  • Gradual disempowerment: AIs could be more productive than humans, and economic competitive pressures mean that humans slowly lose power over time, to the point where we no longer have any effective control. This could happen even without any power seeking AI performing a power-grab. 

The world is not on track to solve these problems. On the current trajectory of AI development, we will likely run head-first into these problems wildly unprepared. 

  1. ^

    Somewhat adapted from our research agenda. 

Reply
What does 10x-ing effective compute get you?
peterbarnett19d85

I liked this post and thought it gave a good impression of just how crazy AIs could get if we allow progress to continue. It also made me even more confident that we really cannot allow AI progress to continue unabated, at least not to the point where AIs are automating AI R&D and getting to this level of capability. 
 

I also think it is very unlikely that AIs 4 SDs above the human range would be controllable, I'd expect them to be able to fairly easily sabotage research they were given without humans noticing. When I think of intelligence gaps like that in humans it feels pretty insurmountable 

Reply
ryan_greenblatt's Shortform
peterbarnett1mo40

Have you contacted the big AI companies (OpenAI, Anthropic, GDM, Meta?) and asked them if they can remove this from their scrapes?

Reply
Fictional Thinking and Real Thinking
peterbarnett1mo111

I claim that this example generalizes: insofar as Joe’s “fake thinking” vs “real thinking” points to a single coherent distinction, it points to thoughts which represent things in other worlds vs thoughts which represent things in our physical world.

This doesn’t feel quite right to me, or at least is missing something. When I think about Joe’s “fake thinking” vs “real thinking”, the main distinction is about whether you are “actually trying” or “actually care”.

When I was 20, I was well aware of the horrors of factory farming, I would said things like “future generations will look back and consider this among the worst moral crimes in history”. But I still ate factory farmed meat, and I didn’t take any actions that showed I cared. My thinking about factory farming was kind of “academic” or an interesting clever and slightly contrarian view, but it didn’t have any real weight behind it. This is despite me knowing that my thoughts referred to the real world.

I orient very differently to factory farming now. I don’t eat meat, and sometimes when I think about the scale, I feel awful, like I’ve been punched in the gut or that I want to cry, and knowing even then that this reaction isn’t at all sufficient for the actual scale. This feels much more real.

I think that maybe you could use this “fictional” vs “real” framing to say that previously I was thinking about factory farming in a kind of fictional way, and that on some level I didn’t actually believe that my thoughts corresponded to a referent in the real/physical world. But this seems a bit off, given that I did know that these things were in the real world.

Reply
Richard Ngo's Shortform
peterbarnett1mo1311

For steps 2-4, I kinda expect current neural nets to be kludgy messes, and so not really have the nice subagent structure (even if you do step 1 well enough to have a thing to look for). 

I'm also fairly pessimistic about step 1, but would be very excited to know what preliminary work here looks like.

Reply
Caleb Biddulph's Shortform
peterbarnett2mo40

Update: 4o seems happy to talk about sycophancy now

Reply
Caleb Biddulph's Shortform
peterbarnett2mo82

I get this with 4o, but not o3. o3 talks about sycophancy in both its CoT and its answers. 

Claude 4 Sonnet and Opus also easily talk about sycophancy. 

Reply
Load More
105AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
2mo
7
161Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Ω
1y
Ω
60
23Trying to align humans with inclusive genetic fitness
2y
5
214Labs should be explicit about why they are building AGI
2y
18
174Thomas Kwa's MIRI research experience
2y
53
14Doing oversight from the very start of training seems hard
Ω
3y
Ω
3
22Confusions in My Model of AI Risk
3y
9
117Scott Aaronson is joining OpenAI to work on AI safety
3y
31
24A Story of AI Risk: InstructGPT-N
3y
0
22Why I'm Worried About AI
3y
2
Load More