Also known as Raelifin: https://www.lesswrong.com/users/raelifin
I agree that if everyone in my decision-theoretic reference class stopped trying to pause AI (perhaps because of being hit by buses), the chance of a pause is near 0.
You are right and I am wrong. Oops. After writing my comment I scrolled up to the top of my post, saw the graph from Manafold (not Metaculus), thought "huh, I forgot the market was so confident" and edited in my parenthetical without thinking. This is even more embarrassing because no market question is actually about the probability conditional on no pause occurring, which is a potentially important factor. I definitely shouldn't have added that text. Thank you.
(I will point out, as a bit of an aside, that economically transformative AI seems like a different threshold than AGI. My sense is that if an AGI takes a million dollars an hour to run an instance, it's still an AGI, but it won't be economically transformative unless it's substantially superintelligent or becomes much cheaper.
Still, I take my lumps.)
Cool. Your definition of AGI seems reasonable. Sounds like we probably disagree about confidence and timelines. (My confidence, I believe, matches Metaculus. [Edit: It doesn't! I'm embarrassed to have claimed this.])
I agree that we seem not to be on the path of pausing. Is your argument "because pausing is extremely unlikely per se, most of the timelines where we make it to 2050 don't have a pause"? If one assumes that we won't pause, I agree that the majority of probability mass for X doesn't involve a pause, for all X, including making it to 2050.
I generally don't think it's a good idea to put a probability on things where you have a significant ability to decide the outcome (i.e. probability of getting divorced), and instead encourage you to believe in pausing.
Alas, I'm not very familiar with Recursive Alignment. I see some similarities, such as the notion of trying to set up a stable equilibrium in value-space. But a quick peek does not make me think Recursive Alignment is on the right track. In particular, I strongly disagree with this opening bit:
What I propose here is to reconceptualize what we mean by AI alignment. Not as alignment with a specific goal, but as alignment with the process of aligning goals with each other. An AI will be better at this process the less it identifies with any side...
What appeals to you about it?
It does not make sense to me to say "it becomes a coffee maximizer as an instrumental goal." Like, insofar as fetching the coffee trades off against corrigibility, it will prioritize corrigibility, so it's only a "coffee maximizer" within the boundary of states that are equally corrigible. As an analogue, let's say you're hungry and decide to go to the store. Getting in your car becomes an instrumental goal to going to the store, but it would be wrong to describe you as a "getting in the car maximizer."
One perspective that might help is that of a whitelist. Corrigible agents don't need to learn the human's preferences to learn what's bad. They start off with an assumption that things are bad, and slowly get pushed by their principal into taking actions that have been cleared as ok.
A corrigible agent won't want to cure cancer, even if it knows the principal extremely well and is 100% sure they want cancer cured -- instead the corrigible agent wants to give the principal the ability to, through their own agency, cure cancer if they want to. By default "cure cancer" is bad, just as all actions with large changes to the world are bad.
Does that make sense? (I apologize for the slow response, and am genuinely interested in resolving this point. I'll work harder to respond more quickly in the near future.)
I'm confused by this. I think you're not using "condition on" in the technical sense, but instead asking how P(AGI > 2050) is effected by P(pause). Is that right?
Assuming it is, we can write:
P(AGI>2050) = 100% - P(AGI<2050) = 100% - P(AGI<2050|pause)P(pause) - P(AGI<2050|no pause)P(no pause)
To make this an expression of just P(pause) we need to assume some values of both P(AGI<2050|no pause) and P(AGI<2050|pause).
You suggested P(AGI<2050|no pause) := 80%. Let's also say that P(AGI<2050|pause) := 30%.
P(AGI>2050) = 100% - (30%)P(pause) - (80%)(100% - P(pause))
= 100% - 30%P - 80% + 80%P = 20% + 50% P(pause)
In other words, with these parameters, a majority of the probability mass for making it to 2050 without AGI comes from a pause if the probability of pausing is above 40%.
(Note: 80% is too low for me, personally. I'm at closer to 95% on P(AGI < 2050|no pause). (Edit: oh wait, does "AGI" mean "transformative superintelligence"?) This would make the equation 5% + 65% P(pause), meaning the majority of mass comes from pausing if the probability of pausing is above 1/7. Whether 30% is the right number depends heavily on what "pause" means. Arguably it should be 0%, which would make the equation 5% + 95% P(pause), and the relevant threshold 1/19th.)
But actually I think I'm probably confused about what you're trying to express. Can you say more words?
This is a good point, and one that I honestly hadn't considered. To be clear, I wasn't suggesting that the blockade or whatever would necessarily have happened in August, but still. I'll edit the post to reflect updating to think maybe Taiwan chaos won't slow things down as quickly as I was naively modeling.
I think the AI problem is going to bite within the next 25 years. Conditional on avoiding disaster for 25 more years, I think the probability of having solved the survive-the-current-moment problem is very high. My best guess is that does not mean the alignment problem will have been solved, but rather that we succeeded in waking up to the danger and slowing things down. But I think I'm pretty optimistic that if the world is awake to the danger and capabilities progress is successfully paused for decades, we'll figure something out. (That "something" might involve very careful and gradual advancement alongside human augmentation, rather than a full "solution." idk)
(I do not think we'll solve alignment in the next 25 years. I think we'll die.)
This is a good point, and I think meshes with my point about lack of consensus about how powerful AIs are.
"Sure, they're good at math and coding. But those are computer things, not real-world abilities."
My reading of the text might be wrong, but it seems like bacteria count as living beings with goals? More speculatively, possible organisms that might exist somewhere in the universe also count for the consensus? Is this right?
If so, a basic disagreement is that I don't think we should hand over the world to a "consensus" that is a rounding error away from 100% inhuman. That seems like a good way of turning the universe into ugly squiggles.
If the consensus mechanism has a notion of power, such that creatures that are disempowered have no bargaining power in the mind of the AI, then I have a different set of concerns. But I wasn't able to quickly determine how the proposed consensus mechanism actually works, which is a bad sign from my perspective.