Research Coordinator of Stop/Pause AI area at AI Safety Camp.
See explainer on why AGI could not be controlled enough to stay safe:
https://www.lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
This answer will sound unsatisfying:
If a mathematician or analytical philosopher wrote a bunch of squiggles on a whiteboard, and said it was a proof, would you recognise it as a proof?
This is high stakes.
We were looking for careful thinkers who had the patience to spend time on understanding the shape of the argument, and how the premises correspond with how things work in reality. Linda and Anders turned out to be two of these people, and we did three long calls so far (first call has an edited transcript).
I wish we could short-cut that process. But if we cannot manage to convey the overall shape of the argument and the premises, then there is no point to moving on to how the reasoning is formalised.
I get that people are busy with their own projects, and want to give their own opinions about what they initially think the argument entails. And, if the time they commit to understanding the argument is not at least 1/5 of the time I spend on conveying the argument specifically to them, then in my experience we usually lack the shared bandwidth needed to work through the argument.
Here is my best attempt at summarising the argument intuitively and precisely, still prompting some misinterpretations by well-meaning commenters. I feel appreciation for people who realised what is at stake, and were therefore willing to continue syncing up on the premises and reasoning, as Will did:
The core claim is not what I thought it was when I first read the above sources and I notice that my skepticism has decreased as I have come to better understand the nature of the argument.
would anything like SNC apply if tech labs were somehow using bioengineering to create creatures to perform the kinds of tasks that would be done by advanced AI?
In that case, substrate-needs convergence would not apply, or only apply to a limited extent.
There is still a concern about what those bio-engineered creatures, used in practice as slaves to automate our intellectual and physical work, would bring about over the long-term.
If there is a successful attempt by them to ‘upload’ their cognition onto networked machinery, then we’re stuck with the substrate-needs convergence problem again.
Also, on the workforce, there are cases where, they were traumatized psychologically and compensated meagerly, like in Kenya. How could that be dealt with?
We need funding to support data workers, engineers, and other workers exploited or misled by AI corporations to unionise, strike, and whistleblow.
The AI data workers in Kenya started a union, and there is a direct way of supporting targeted action by them. Other workers' organisations are coordinating legal actions and lobbying too. On seriously limited budgets.
I'm just waiting for a funder to reach out and listen carefully to what their theories of change are.
The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that "even if we align ASI it may still go wrong".
I can see how you and Forrest ended up talking past each other here. Honestly, I also felt Forrest's explanation was hard to track. It takes some unpacking.
My interpretation is that you two used different notions of alignment... Something like:
Forrest seems to agree that (1.) is possible to built initially into the machinery, but has reasons to think that (2.) is actually physically intractable.
This is because (1.) only requires localised consistency with respect to specified goals, whereas (2.) requires "completeness" in the machinery's components acting in care for human existence, wherever either may find themselves.
So here is the crux:
When you wrote "suppose a villager cares a whole lot about the people in his village...and routinely works to protect them" that came across as taking something like (2.) as a premise.
Specifically, "cares a whole lot about the people" is a claim that implies that the care is for the people in and of themselves, regardless of the context they each might (be imagined to) be interacting in. Also, "routinely works to protect them" to me implies a robustness of functioning in ways that are actually caring for the humans (ie. no predominating potential for negative side-effects).
That could be why Forrest replied with "How is this not assuming what you want to prove?"
Some reasons:
To wrap it up:
The kind of "alignment" that is workable for ASI with respect to humans is super fragile.
We cannot rely on ASI implementing a shut-down upon discovery.
Is this clarifying? Sorry about the wall of text. I want to make sure I'm being precise enough.
I agree that point 5 is the main crux:
The amount of control necessary for an ASI to preserve goal-directed subsystems against the constant push of evolutionary forces is strictly greater than the maximum degree of control available to any system of any type.
To answer it takes careful reasoning. Here's my take on it:
Actually, looks like there is a thirteenth lawsuit that was filed outside the US.
A class-action privacy lawsuit filed in Israel back in April 2023.
Wondering if this is still ongoing: https://www.einpresswire.com/article/630376275/first-class-action-lawsuit-against-openai-the-district-court-in-israel-approved-suing-openai-in-a-class-action-lawsuit
That's an important consideration. Good to dig into.
I think there are many instances of humans, flawed and limited though we are, managing to operate systems with a very low failure rate.
Agreed. Engineers are able to make very complicated systems function with very low failure rates.
Given the extreme risks we're facing, I'd want to check whether that claim also translates to 'AGI'.
to spend extra resources on backup systems and safety, such that small errors get actively cancelled out rather than compounding.
This gets right into the topic of the conversation with Anders Sandberg. I suggest giving that a read!
Errors can be corrected out with high confidence (consistency) at the bit level. Backups and redundancy also work well in eg. aeronautics, where the code base itself is not constantly changing.
Since intelligence is explicitly the thing which is necessary to deliberately create and maintain such protections, I would expect control to be easier for an ASI.
It is true that 'intelligence' affords more capacity to control environmental effects.
Noticing too that the more 'intelligence,' the more information-processing components. And that the more information-processing components added, the exponentially more degrees of freedom of interaction those and other functional components can have with each other and with connected environmental contexts.
Here is a nitty-gritty walk-through in case useful for clarifying components' degrees of freedom.
I disagree that small errors necessarily compound until reaching a threshold of functional failure.
For this claim to be true, the following has to be true:
a. There is no concurrent process that selects for "functional errors" as convergent on "functional failure" (failure in the sense that the machinery fails to function safely enough for humans to exist in the environment, rather than that the machinery fails to continue to operate).
Unfortunately, in the case of 'AGI', there are two convergent processes we know about:
Or else – where there is indeed selective pressure convergent on "functional failure" – then the following must be true for the quoted claim to hold:
b. The various errors introduced into and selected for in the machinery over time could be detected and corrected for comprehensively and fast enough (by any built-in control method) to prevent later "functional failure" from occurring.
This took a while for me to get into (the jumps from “energy” to “metabolic process” to “economic exchange” were very fast).
I think I’m tracking it now.
It’s about metabolic differences as in differences in how energy is acquired and processed from the environment (and also the use of a different “alphabet” of atoms available for assembling the machinery).
Forrest clarified further in response to someone’s question here:
https://mflb.com/ai_alignment_1/d_240301_114457_inexorable_truths_gen.html
Note:
Even if you are focussed on long-term risks, you can still whistleblow on eggregious harms caused by these AI labs right now. Providing this evidence enables legal efforts to restrict these labs.
Whistleblowing is not going to solve the entire societal governance problem, but it will enable others to act on the information you provided.
It is much better than following along until we reached the edge of the cliff.
Exactly. Without the data, the model design cannot be trained again, and you end up fine-tuning a black box (the "open weights").
Thanks for writing this.