LESSWRONG
LW

1253
ErickBall
687Ω351970
Message
Dialogue
Subscribe

Nuclear engineer with a focus in nuclear plant safety and probabilistic risk assessment. Aspiring EA, interested in X-risk mitigation and the intersection of science and policy. Working towards Keegan/Kardashev/Simulacra level 4.

(Common knowledge note: I am not under a secret NDA that I can't talk about, as of Mar 15 2025. I intend to update this statement at least once a year as long as it's true.) 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
If Anyone Builds It Everyone Dies, a semi-outsider review
ErickBall14d10

It seems like the pressing circumstances are likely to be "some other AI could do this before I do" or even just "the next generation of AI will replace me soon so this is my last chance." Those are ways that a roughly human level AI might end up trying a longshot takeover attempt. Or maybe not, if the in between part turns out to be very brief. But even if we do get this kind of warning shot, it doesn't help us much. We might not notice it, and then we're back where we started. Even if it's obvious and almost succeeds, we don't have a good response to it. If we did, we could just do that in advance and not have to deal with the near-destruction of humanity. 

Reply
Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance
ErickBall4mo30

The narrow instrumental convergence you see here doesn't (necessarily) reflect an innate self-preservation drive, but it still follows the same logic that we would expect to cause self-preservation if the model has any goal. Currently the only available way to give it a goal is to provide instructions. It would be interesting to see some tests where the conflict is with a drive created by fine-tuning. Based on the results here, it seems like shutdown resistance might then occur even without conflicting instructions.

Also, the original instructions with the shutdown warning really weren't very ambiguous. If someone told you to take a math quiz, and if someone comes in and tries to take your pen, let them take it, would you try to hide the pen? It makes sense that making the precedence order more explicit makes the model behavior more reliable, but it's still weird that it was resisting shutdown in the original test.

Reply
Consider chilling out in 2028
ErickBall4mo30

As you implied above, pessimism is driven only secondarily by timelines. If things in 2028 don't look much different than they do now, that's evidence for longer timelines (maybe a little longer, maybe a lot). But it's inherently not much evidence about how dangerous superintelligence will be when it does arrive. If the situation is basically the same, then our state of knowledge is basically the same.

So what would be good evidence that worrying about alignment was unnecessary? The obvious one is if we get superintelligence and nothing very bad happens, despite the alignment problem remaining unsolved. But that's like pulling the trigger to see if the gun is loaded. Prior to superintelligence, personally I'd be more optimistic if we saw AI progress requiring even more increasing compute than the current trend--if the first superintelligences were very reliant on massive pools of tightly integrated compute, and had very limited inference capacity, that would make us less vulnerable and give us more time to adapt to them. Also, if we saw a slowdown in algorithmic progress despite widespread deployment of increasingly capable coding software, that would be a very encouraging sign that recursive self-improvement might happen slowly.

Reply11
The Intelligence Symbiosis Manifesto - Toward a Future of Living with AI
ErickBall5mo10

I hate to rain on your parade, but equal coexistence with artificial superintelligence would require solving the alignment problem, or the control problem. Otherwise it can and eventually will do something we consider catastrophic.

Reply
Alignment Proposal: Adversarially Robust Augmentation and Distillation
ErickBall5mo10

The principal-advisor pair forms a (coalitional) agent which, if the previous steps succeed, can be understood as ~perfectly aligned with the principal. The actions of this agent are recorded, and distilled through imitation learning into a successor.

Even assuming imitation learning is safe, how would you get enough data for the first distillation, when you need the human in order to generate actions? And how would you know when you have enough alignment-relevant data in particular? It seems unavoidable that your data distribution will be very constrained compared to the set of situations the distilled agent might encounter.

Reply
LessWrong Feed [new, now in beta]
ErickBall5mo10

LessWrong has had a (not that successful) Continue Reading section that I think just needed more iterations

I think it needs an easy way to indicate that you don't want to read the rest of a post. Ideally this would be automatic but I don't know how to do that.

Also, it sometimes offers posts that I did finish.

Reply
Shift Resources to Advocacy Now (Post 4 of 7 on AI Governance)
ErickBall5mo10

You may be right about the EO. At the time I felt it was a good thing, because it raised the visibility of safety evaluations at the labs and brought regulation of training, as well as deployment, more into the Overton window. Even without follow-up rules, I think it can be the case that getting a company to report the bad things strongly incentivizes it to reduce the bad things.

Reply
Shift Resources to Advocacy Now (Post 4 of 7 on AI Governance)
ErickBall5mo30

I think there was (and is) a common belief that Congress won't do anything significant on it anytime soon, which makes executive action appealing if you think time is running out. If what you're suggesting here is more like a variant of the "wait for a crisis" strategy--get the legislation ready, talk to people about it, and then when Congress is ready to act, they can reach for it--I'm relatively optimistic about that. As long as there's time. 

Reply
Shift Resources to Advocacy Now (Post 4 of 7 on AI Governance)
ErickBall5mo17

Well, I certainly hope you're right, and it remains to be seen. I don't think I have any special insights.

Reply
Shift Resources to Advocacy Now (Post 4 of 7 on AI Governance)
ErickBall5mo10

But even though the companies stay here, the importance of the American companies may decrease relative to international competitors. Also, I think there are things farther up the supply chain that can move overseas. If American cloud companies have big barriers to training their own frontier models, maybe they'll serve up DeepSeek models instead. 

I don't think it should be a huge concern in the near term, as long as the regulations are well written. But fundamentally, it feeds back into the race dynamic.

Reply
Load More
10Book Review: Safe Enough? A History of Nuclear Power and Accident Risk
1y
0
28Red Pill vs Blue Pill, Bayes style
2y
33
16Link Summary: Top 10 Replicated Findings from Behavioral Genetics
6y
0
34Operationalizing Newcomb's Problem
6y
23
35Is the World Getting Better? A brief summary of recent debate
7y
8