Research coordinator of Stop/Pause area at AI Safety Camp.
See explainer on why AGI could not be controlled enough to stay safe:
lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
Thanks!
With ‘possible worlds’, do you mean ‘possible to be reached from our current world state’?
And what do you mean with ‘alignment’? I know that can sound like an unnecessary question. But if it’s not specified, how can people soundly assess whether it is technically solvable?
Thanks, when you say “in the space of possible mathematical things”, do you mean “hypothetically possible in physics” or “possible in the physical world we live in”?
Here's how I specify terms in the claim:
Good to know. I also quoted your more detailed remark on AI Standards Lab at the top of this post.
I have made so many connections that have been instrumental to my research.
I didn't know this yet, and glad to hear! Thank you for the kind words, Nell.
Fair question. You can assume it is AoE.
Research leads are not going to be too picky in terms of what hour you send the application in,
There is no need to worry about the exact deadline. Even if you send in your application on the next day, that probably won't significantly impact your chances of getting picked up by your desired project(s).
Sooner is better, since many research leads will begin composing their teams after the 17th, but there is no hard cut-off point.
Thanks! These are thoughtful points. See some clarifications below:
AGI could be very catastrophic even when it stops existing a year later.
You're right. I'm not even covering all the other bad stuff that could happen in the short-term, that we might still be able to prevent, like AGI triggering global nuclear war.
What I'm referring to is unpreventable convergence on extinction.
If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless.
Agreed that could be a good outcome if it could be attainable.
In practice, the convergence reasoning is about total human extinction happening within 500 years after 'AGI' has been introduced into the environment (with very very little probability remainder above that).
In theory of course, to converge toward 100% chance, you are reasoning about going across a timeline of potentially infinite span.
I don't know whether that covers "humans can survive on mars with a space-suit",
Yes, it does cover that. Whatever technological means we could think of shielding ourselves, or 'AGI' could come up with to create as (temporary) barriers against the human-toxic landscape it creates, still would not be enough.
if humans evolve/change to handle situations that they currently do not survive under
Unfortunately, this is not workable. The mismatch between the (expanding) set of conditions needed for maintaining/increasing configurations of the AGI artificial hardware and for our human organic wetware is too great.
Also, if you try entirely changing our underlying substrate to the artificial substrate, you've basically removed the human and are left with 'AGI'. The lossy scans of human brains ported onto hardware would no longer feel as 'humans' can feel, and will be further changed/selected for to fit with their artificial substrate. This is because what humans and feel and express as emotions is grounded in the distributed and locally context-dependent functioning of organic molecules (eg. hormones) in our body.
I'm also feeling less "optimistic" about an AI crash given:
I will revise my previous forecast back to 80%+ chance.
With this example, you might still assert that "possible worlds" are world states reachable through physics from past states of the world. Ie. you could still assert that alignment possibility is path-dependent from historical world states.
But you seem to mean something broader with "possible worlds". Something like "in theory, there is a physically possible arrangement of atoms/energy states that would result in an 'aligned' AGI, even if that arrangement of states might not be reachable from our current or even a past world".
–> Am I interpreting you correctly?
You saying this shows the ambiguity here of trying to understand what different people mean. One researcher can make a technical claim about the possibility/tractability of "alignment" that is similarly worded to a technical claim others made. Yet their meaning of "alignment" could be quite different.
It's hard then to have a well-argued discussion, because you don't know whether people are equivocating (ie. switching between different meanings of the term).
That's a good summary list! I like the inclusion of "long-term outcomes" in P6. In contrast, P4 could just entail short-term problems that were specified by a designer or user who did not give much thought to long-term repercussions.
The way I deal with the wildly varying uses of the term "alignment" is to use a minimum definition that most of those six interpretations are consistent with. Where (almost) everyone would agree that AGI not meeting that definition would be clearly unaligned.