Regarding your list, Eliezer has written extensively about exactly why those seem like good assumptions. If you want a quick summary though...
Thanks for the really insightful answer! I think I'm pretty much convinced on points 1, 2, 5, and 7, mostly agree with you on 6 and 8, and still don't understand the sheer hopelessness of people who strongly believe 9. Assumptions 3, and 4, however, I'm not sure I fully follow, as it doesn't seem like a slam dunk that the orthogonality thesis is true, as far as I can tell. I'd expect there to be basins of attraction towards some basic values, or convergence, sort of like carcinisation.
Carcinisation is an excellent metaphor for convergent instrumental values, i.e. values that are desired for ends other than themselves, and which can serve a wide variety of ends, and thus might be expected to occur in a wide variety of minds. In fact, there’s been some research on exactly that by Steve Omohundro, who defined the Omohundro Goals (well worth looking up). These are things like survival and preservation of your other goals, as it’s usually much easier to accomplish a thing if you remain alive to work on it, and continue to value doing so. However, orthogonality doesn’t apply to instrumental goals, which can do a good or bad job of serving as an effective path to other goals, and thus experience selection and carcinisation. Rather, it applies to terminal goals, those things we want purely for their own sake. It’s impossible to judge terminal goals as good or bad (except insofar as they accord or conflict with our own terminal goals, and that’s not a standard an AI automatically has to care about), as they are themselves the standard by which everything else is judged. The researcher Rob Miles has an excellent YouTube video about this you might enjoy entitled Intelligence and Stupidity: the Orthogonality Thesis, which goes into more depth. (Sorry for the lack of direct links; I’m sending this from my phone immediately before going to bed.)
As a minor token of how much you're missing:
- If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
You can educate them all you want about the dangers, they'll still die. No solution is known. Doesn't matter if a particular group is cautious enough to not press forwards (as does not at all presently seem to be the case, note), next group in line destroys the world.
You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.
This picture is unfortunately accurate, due to how little dignity we're dying with.
But if we were on course to die with more dignity than this, we'd still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn't destroy the world, even if they want that; not because they're "insufficiently educated" in some solution that is known elsewhere, but because there is no known plan in which to educate them.
If you knew this, you sure picked a strange straw way of phrasing it, to say that the danger was AGI created by "people who are not sufficiently educated", as if any other kind of people could exist, or it was a problem that could be solved by education.
For what it's worth, I interpreted Yitz's words as having the subtext "and no one, at present, as sufficiently educated, because no good solution is known" and not the subtext "so it's OK because all we have to do is educate people".
(Also and unrelatedly: I don't think it's right to say "The recklessness is not the source of the problem". It seems to me that the recklessness is a problem potentially sufficient to kill us all, and not knowing a solution to the alignment problem is a problem potentially sufficient to kill us all, and both of those problems are likely very hard to solve. Neither is the source of the problem; the problem has multiple sources all potentially sufficient to wipe us out.)
Apologies for the strange phrasing, I'll try to improve my writing skills in that area. I actually fully agree with you that [assuming even "slightly unaligned"[1] AGI will kill us], even highly educated people who put a match to kerosene will get burned. By using the words "sufficiently educated," my intention was to denote that in some sense, there is no sufficiently educated person on this planet, at least not yet.
as if any other kind of people could exist, or it was a problem that could be solved by education.
Well, I think that this is a problem that can be solved with education, at least in theory. The only problem is that we have no teachers (or even a lesson plan), and the final is due tomorrow. Theoretically though, I don't see any strong reason why we can't find a way to either teach ourselves or cheat, if we get lucky and have the time. Outside of this (rather forced) metaphor, I wanted to imply my admittedly optimistic sense that there are plausible futures in which AI researchers exist who do have the answer to the alignment problem. Even in such a world, of course, people who don't bother to learn the solution or act in haste could still end the world.
My sense is ...
I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you're creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can't think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it's not clear if such AI systems will tell us much about how more powerful systems will behave. It's this "single chance to transition from a safe to dangerous operating domain" part of the problem that is so uniquely difficult about AI alignment.
I did ask to be critiqued, so in some sense it's a totally fair response, imo. At the same time, though, Eliezer's response does feel rude, which is worthy of analysis, considering EY's outsized impact on the community.[1] So why does Yudkowsky come across as being rude here?
My first thoughts upon reading his comment (when scanning for tone) is that it opens with what feels like an assumption of inferiority, with the sense of "here, let me grant you a small parcel of my wisdom so that you can see just how wrong you are," rather than "let me share insight I have gathered on my quest towards truth which will convince you." In other words, a destructive, rather than constructive tone. This isn't really a bad thing in the context of honest criticism. However, if you happen to care about actually changing other's minds, most people respond better to a constructive tone, so their brain doesn't automatically enter "fight mode" as an immediate adversarial response. My guess is Eliezer only really cares about convincing people who are rational enough not to become reactionaries over an adversarial tone, but I personally believe that it's worth tailoring public comments like this ...
I think Eliezer was rude here, and both you and the mods think that the benefits of the good parts of the comment outweigh the costs of the rudeness. That's a reasonable opinion, but it doesn't make Eliezer's statement not rude, and I'm in general happy that both the rudeness and the usefulness are being entered into common knowledge.
FWIW, I think it's more likely he's just tired of how many half-baked threads there are each time he makes a new statement about AI. This is not a value judgement of this post. I genuinely read it as a "here's why your post doesn't respond to my ideas".
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
I fully agree that there is a big risk of both massive damage to human preferences, and even the extinction of all life, so AI Alignment work is highly valuable, but why is "unproductive destruction of the entire world" so certain?
I'd like to distinguish between two things. (Bear with me on the vocabulary. I think it probably exists, but I am not really hitting the nail on the head with the terms I am using.)
Consider this example. I believe that the big bang was real. Why do I believe this? Well, there are other people who believe it and seem to have a very good grasp on the gears level reasons. These people seem to be reliable. Many others also judge that they are reliable. Yada yada yada. So then, I myself adopt this belief that the big bang is real, and I am quite confident in it.
But despite having watched the Cosmos episode at some point in the past, I really have no clue how it works at the gears level. The knowledge isn't Truly A Part of Me.
The situations with AI is very similar. Despite having hung out on LessWrong for so long, I really don't have much of a gears level understanding at all. But there are people who I have a very high epistemic (and moral) respect for who do seem to have a grasp on things at the gears level, and are claiming to be highly confident about things like short timelines and us being very far and not on pace to solve the alignment problem. Furthermore, lots of other people who I respect also have adopted this as their belief, eg. other LessWrongers who are in a similar boat as me with not having expertise in AI. And as a cherry on top of that, I spoke with a friend the other day who isn't a LessWronger but for whom I have a very high amount of epistemic respect for. I explained the situation to him, and he judged all the grim talk to be, for lack of a better term, legit. It's nice to get an "outsider's" perspective as a guard against things like groupthink.
So in short, I'm in the boat of having 2 but not 1. And it seems appropriate to me more generally to be able to have 2 but not 1. It'd be hard to get along in life if you always required a 1 to go hand in hand with 2. (Not to discourage anyone from also pursuing 1. Just that I don't think it should be a requirement.)
Coming back to the OP, it seems to be mostly asking about 1, but kinda conflating it with 2. My claim is that these are different things that should kinda be talked about separately, and that assuming that you too have a good amount of epistemic trust for Eliezer and all of the other people making these claims, you should probably adopt their beliefs as well.
Thanks for the reminder that belief and understanding are two seperate (but related) concepts. I'll try to keep that in mind for the future.
Assuming that you too have a good amount of epistemic trust for Eliezer and all of the other people making these claims, you should probably adopt their beliefs as well.
I don't think I can fully agree with you on that one. I do place high epistemic trust in many members of the rationalist community, but I also place high epistemic trust on many people who are not members of this community. For example, I place extremely high value on the insights of Roger Penrose, based on his incredible work on multiple scientific, mathematical, and artistic subjects that he's been a pioneer in. At the same time, Penrose argues in his book The Emperor's New Mind that consciousness is not "algorithmic," which for obvious reasons I find myself doubting. Likewise, I tend to trust the CDC, but when push came to shove during the pandemic, I found myself agreeing with people's analysis here.
I don't think that argument from authority is a meaningful response here, because there are more authorities than just those in the rationalist community., and even if there weren't, sometimes authorities can be wrong. To blindly follow whatever Eliezer says would, I think, be antithetical to following what Eliezer teaches.
I think a good understanding of 1 would be really helpful for advocacy. If I don't understand why AI alignment is a big issue, I can't explain it to anybody else, and they won't be convinced by me saying that I trust the people who say AI alignment is a big issue.
I find point no. 4 weak.
- Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
I worry that when people reason about utility functions, they're relying upon the availability heuristic. When people try to picture "a random utility function", they're heavily biased in favor of the kind of utility functions they're familiar with, like paperclip-maximization, prediction error minimization, or corporate profit-optimization.
How do we know that a random sample from utility-function-space looks anything like the utility functions we're familiar with? We don't. I wrote a very short story to this effect. If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
Coherence arguments imply a force for goal-directed behavior.
- AGI is possible to create.
Humans exist.
- AGI will be created within the next century or so, possibly even within the next few years.
The next century is consensus, I think, and arguments against the next few years are not on the level where I would be comfortable saying "well, it wouldn't happen, so it's ok to try really hard to do it anyway".
- If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
I guess the problem here is that by the most natural metrics the best way for AGI to serve its function provably leads to catastrophic results. So you either need to not try very hard, or precisely specify human values from the beginning.
- Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
Not sure what's the difference with 3 - that's just definition of "unaligned"?
- We will not be able to effectively stop an unaligned AGI once it is created (due to the Corrigibility problem).
Even if we win against first AGI, we are now in a situation where AGI is proved for everyone to be possible and probably easy to scale to uncontainable levels.
- We have not yet solved the Alignment problem (of which the Corrigibility problem is merely a subset), and there does not appear to be any likely avenues to success (or at least we should not expect success within the next few decades).
I don't think anyone claims to have a solution that works in non-optimistic scenario?
- Even if we solved the Alignment problem, if a non-aligned AGI arrives on the scene before we can implement ours, we are still doomed (due to first-mover advantage).
There are also related considerations like "aligning something non-pivotal doesn't help much".
- Our arguments for all of the above are not convincing or compelling enough for most AI researchers to take the threat seriously.
The more seriously researchers take the threat, the more people will notice, and then someone will combine techniques from last accessible papers on new hardware and it will work.
- As such, unless some drastic action is taken soon, unaligned AGI will be created shortly, and that will be the end of the world as we know it.
I mean, "doomed" means there are no much drastic actions to take^^.
Humans exist
Birds exist, but cannot create artificial flight
Being an X is a guarantee that an X is possible, but not a guarantee that an X can replicate itself.
The laws of physics in our particular universe make fission/fusion release of energy difficult enough that you can't ignite the planet itself. (well you likely can, but you would need to make a small black hole, let it consume the planet, then bleed off enough mass that it then explodes. Difficult).
Imagine a counterfactual universe where you could, and the Los Alamos test ignited the planet and that was it.
My point is that we do not actually know yet how 'somewhat superintelligent' AIs will fail. They may 'quench' themselves like fission devices do - fission devices blast themselves apart and stop reacting, and almost all elements and isotopes won't fission. Somewhat superintelligent AGIs may expediently self hack their own reward function to give them infinite reward, shortly after box escape, and thus 'quench' the explosion in a quick self hack.
So our actual survival unfortunately probably depends on luck. It depends not on what any person does, but on the laws of nature. In a world where a fission device will ignite the planet, we'd be doomed - there is nothing anyone could do to 'align' fission researchers not to try it. Someone would try it and we'd die. If AGI is this dangerous, yeah, we're doomed.
So our actual survival unfortunately probably depends on luck. It depends not on what any person does, but on the laws of nature. In a world where a fission device will ignite the planet, we'd be doomed - there is nothing anyone could do to 'align' fission researchers not to try it. Someone would try it and we'd die. If AGI is this dangerous, yeah, we're doomed.
In this world a society like dath ilan would still have a good chance at survival.
The bottom line is: nobody has a strong argument in support of the inevitability of the doom scenario (If you have it, just reply to this with a clear and self contained argument.).
From what I'm reading in the comments and in other papers/articles, it's a mixture of beliefs, estrapolations from known facts, reliance on what "experts" said, cherry picking. Add the fact that bad/pessimistic news travel and spread faster than boring good news.
A sober analysis enstablish that super-AGI can be dangerous (indeed there are no theorems forbidding this either), what's unproven is that it will be HIGHLY LIKELY to be a net minus for humanity. Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).
Another factors often forgotten is that what we mean by "humanity" today may not have the same meaning when we will have technologies like AGIs, mind upload or intelligence enhancement. We may literally become those AIs.
Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).
Because unchecked convergent instrumental goals for AGI are already in contrast with humanity goals. As soon as you realize humanity may have reasons to want to shut down/restrain an AGI (through whatever means), this gives ground to the AGI to wipe humanity.
As I see, nobody is afraid of "alpine village life maximization", as some are afraid of "paper-clip maximization". Why is that? I wouldn't mind very much, a rouge superintelligence which tiles the Universe with alpine villages. In the past discussions, that would be "astronomical waste", now it's not even in the cards anymore? We are doomed to die, and not to be "bored for billion of years in a nonoptimal scenario". Interesting.
Right now no one knows how to maximize either paper clips or alpine villages. The first thing we know how to do will probably be some poorly-understood recursively self-improving cycle of computer code interacting with other computer code. Then the resulting intelligence will start converging on some goal and converge on capabilities to optimize it extremely powerfully. The problem is that that emergent goal will be a lot more random and arbitrary than an alpine village. Most random things that this process can land on look like a paper clip in how devoid of human value they are, not like an alpine village which has a very significant amount of human value in it.
I see no problems with your list. I would add that creating corrigible superhumanly intelligent AGI doesn't necessarily solve the AI Control Problem forever because its corrigibility may be incompatible with its application to the Programmer/Human Control Problem, which is the threat that someone will make a dangerous AGI one day. Perhaps intentionally.
A desire to understand the arguments is admirable.
Wanting to actually be convinced that we are in fact doomed is a dereliction of duty.
Karl Popper wrote that
Optimism is a duty. The future is open. It is not predetermined. No one can predict it, except by chance. We all contribute to determining it by what we do. We are all equally responsible for its success.
Only those who believe success is possible will work to achieve it. This is what Popper meant by "optimism is a duty".
We are not doomed. We do face danger, but with effort and attention we may yet survive.
I am not as smart as most of the people who read this blog, nor am I an AI expert. But I am older than almost all of you. I've seen other predictions of doom, sincerely believed by people as smart as you, come and go. Ideology. Nuclear war. Resource exhaustion. Overpopulation. Environmental destruction. Nanotechnological grey goo.
One of those may yet get us, but so far none has, which would surprise a lot of people I used to hang around with. As Edward Gibbon said, "however it may deserve respect for its usefulness and antiquity, [prediction of the end of the world] has not been found agreeable to experience."
One thing I've learned with time: Everything is more complicated than it seems. And prediction is difficult, especially about the future.
Other people have addressed the truth/belief gap. I want to talk about existential risk.
We got EXTREMELY close to extinction with nukes, more than once. Launch orders in the Cold War were given and ignored or overridden three separate times that I'm aware of, and probably more. That risk has declined but is still present. The experts were 100% correct and their urgency and doomsday predictions were arguably one of the reasons we are not all dead.
The same is true of global warming, and again there is still some risk. We probably got extremely lucky in the last decade and happened upon the right tech and strategies and got decent funding to combat climate change such that it won't reach 3+ degrees deviation, but that's still not a guarantee and it also doesn't mean the experts were wrong. It was an emergency, it still is, the fact that we got lucky doesn't mean we shouldn't have paid very close attention.
The fact that we might survive this potential apocalypse too is not a reason to act like it is not a potential apocalypse. I agree that empirically, humans have a decent record at avoiding extinction when a large number of scientific experts predict its likelihood. It's not a g...
I want to be convinced of the truth. If the truth is that we are doomed, I want to know that. If the truth is that fear of AGI is yet another false eschatology, then I want to know that as well. As such, I want to hear the best arguments that intelligent people make, for the position they believe to be true. This post is explicitly asking for those who are pessimistic to give their best arguments, and in the future, I will ask the opposite.
I fully expect the world to be complicated.
Fair enough. If you don't have the time/desire/ability to look at the alignment problem arguments in detail, going by "so far, all doomsday predictions turned out false" is a good, cheap, first-glance heuristic. Of course, if you eventually manage to get into the specifics of AGI alignment, you should discard that heuristic and instead let the (more direct) evidence guide your judgement.
Talking about predictions, there's been an AI winter a few decades ago, when most predictions of rapid AI progress turned out completely wrong. But recently, it's the oppos...
Wanting to actually be convinced that we are in fact doomed is a dereliction of duty.
Your Wise-sounding complacent platitudes likewise.
FWIW, I too am older than almost everyone else here. However, I do not cite my years as evidence of wisdom.
When people who are smarter than you
The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.
There is no form of smartness that makes you equally good at everything.
Given the replication crisis, blind deference to academic qualifications is absurd. While there are certainly many smart PhDs, a piece of paper from a university does not automatically confer either intelligence or understanding.
Why the extreme downvotes here? This seems like a good point, at least generally speaking, even if you disagree with what the exact subset should be. Upvoted.
Here's the quote again:
The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.
I think that it's possible for people without relevant industry experience or academic qualifications to say correct things about AGI risk, and I think it's possible for people with relevant industry experience or academic qualifications to say stupid things about AGI risk.
For one thing, the latter has to be true, because there are people with relevant industry experience or academic qualifications who vehemently disagree about AGI risk with other people with relevant industry experience or academic qualifications. For example, if Yann LeCun is right about AGI risk then Stuart Russell is utterly dead wrong about AGI risk and vice-versa. Yet both of them have impeccable credentials. So it's a foregone conclusion that you can have impeccable credentials yet say things that are dead wrong.
For another thing, AGI does not exist today, and therefore it's far from clear that anyone on earth has “relevant” industry experience. Likewise, I'm pretty confident that you can spend 6 years getting a PhD in AI or ML without hearing literally ...
[Edited to link correct survey.]
It's really largely Eliezer and some MIRI people. Most alignment researchers (e.g. at ARC, Deepmind, Open AI, Anthropic, CHAI) and most of the community [ETA: had wrong link here before] disagree (I count myself among those who disagree, although I am concerned about a big risk here), and think MIRI doesn't have good reasons to support the claim of almost certain doom.
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes) has a good chance of working well enough to make better controls and so on. For an AI apocalypse it's not only required that unaligned superintelligent AI outwit humans, but that all the safety/control/interpretabilty gains yielded by AI along the way also fail, creating a very challenging situation for misaligned AI.
It's really largely Eliezer and some MIRI people.
Hm? I was recently at a 10-15 person lunch for people with >75% on doom, that included a number of non-MIRI people, including at least one person each from FHI and DeepMind and CHAI.
(Many of the people had interacted with MIRI or at some time worked with/for them, but work at other places now.)
Just registering your comment feels a little overstated, but you're right to say a lot of this emanates from some folks at MIRI. For one, I had been betting a lot on MIRI, and now feel like a lot more responsibility has fallen on my plate.
You've now linked to the same survey twice in difference discussions of this topic, even though this survey, as far as I can tell, provides no evidence of the position you are trying to argue for. To copy Thomas Kwa's response to your previous comment:
I don't see anything in the linked survey about a consensus view on total existential risk probability from AGI. The survey asked researchers to compare between different existential catastrophe scenarios, not about their total x-risk probability, and surely not about the probability of x-risk if AGI were developed now without further alignment research.
We asked researchers to estimate the probability of five AI risk scenarios, conditional on an existential catastrophe due to AI having occurred. There was also a catch-all “other scenarios” option.
[...]
Most of this community’s discussion about existential risk from AI focuses on scenarios involving one or more powerful, misaligned AI systems that take control of the future. This kind of concern is articulated most prominently in “Superintelligence” and “What failure looks like”, corresponding to three scenarios in our survey (the “Superintelligence” scenario, part 1 and part 2 of “What failure looks like”). The median respondent’s total (conditional) probability on these three scenarios was 50%, suggesting that this kind of concern about AI risk is still prevalent, but far from the only kind of risk that researchers are concerned about today.
It also seems straightforwardly wrong that it's just Eliezer and some MIRI people. While there is a wide variance in opinions on probability of doom from people working in AI Alignment, there are many people at Redwood, OpenAI and other organizations who assign very high probability here. I don't think it's at all accurate to say this fits neatly along organizational boundaries, nor is it at all accurate to say that this is "only" a small group of people. My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Whoops, you're right that I linked the wrong survey. I see others posted the link to Rob's survey (done in response to some previous similar claims) and I edited my comment to fix the link.
I think you can identify a cluster of near certain doom views, e.g. 'logistic success curve' and odds of success being on the order of magnitude of 1% (vs 10%, or 90%) based around MIRI/Eliezer, with a lot of epistemic deference involved (visible on LW). I would say it is largely attributable there and without sufficient support.
"My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%."
What do you make of Rob's survey results (correct link this time)?
My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Depending on how you choose the survey population, I would bet that it's fewer than 35%, at 2:1 odds.
(Though perhaps you've already updated against based on Rob's survey results below; that survey happened because I offered to bet against a similar claim of doom probabilities from Rob, that I would have won if we had made the bet.)
I'd just say the numbers from the survey below? Maybe slightly updated towards doom; I think probably some of the respondents have been influenced by recent wave of doomism.
If you had a more rigorously defined population, such that I could predict the differences between that population and the population surveyed below, I could predict more differences.
My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Not what you were asking for (time has passed, the Q is different, and the survey population is different too), but in my early 2021 survey of people who "[research] long-term AI topics, or who [have] done a lot of past work on such topics" at a half-dozen orgs, 3/27 ≈ 11% of those who marked "I'm doing (or have done) a lot of technical AI safety research." gave an answer above 80% to at least one of my attempts to operationalize 'x-risk from AI'. (And at least two of those three were MIRI people.)
The weaker claim "risk (on at least one of the operationalizations) is at least 80%" got agreement from 5/27 ≈ 19%, and "risk (on at least one of the operationalizations) is at least 66%" got agreement from 9/27 ≈ 33%.
MIRI doesn't have good reasons to support the claim of almost certain doom
I recently asked Eliezer why he didn't suspect ELK to be helpful, and it seemed that one of his major reasons was that Paul was "wrongly" excited about IDA. It seems that at this point in time, neither Paul nor Eliezer are excited about IDA, but Eliezer got to the conclusion first. Although, the IDA-bearishness may be for fundamentally different reasons -- I haven't tried to figure that out yet.
Have you been taking this into account re: your ELK bullishness? Obviously, this sort of point should be ignored in favor of object-level arguments about ELK, but to be honest, ELK is taking me a while to digest, so for me that has to wait.
It seems that at this point in time, neither Paul nor Eliezer are excited about IDA
I'm still excited about IDA.
I assume this is coming from me saying that you need big additional conceptual progress to have an indefinitely scalable scheme. And I do think that's more skeptical than my strongest pro-IDA claim here in early 2017:
I think there is a very good chance, perhaps as high as 50%, that this basic strategy can eventually be used to train benign state-of-the-art model-free RL agents. [...] That does not mean that I think the conceptual issues are worked out conclusively, but it does mean that I think we’re at the point where we’d benefit from empirical information about what works in practice
That said:
Did Eliezer give any details about what exactly was wrong about Paul’s excitement? Might just be an intuition gained from years of experience, but the more details we know the better, I think.
Some scattered thoughts in this direction:
I’ve been very heavily involved in the (online) rationalist community for a few months now, and like many others, I have found myself quite freaked out by the apparent despair/lack of hope that seems to be sweeping the community. When people who are smarter than you start getting scared, it seems wise to be concerned as well, even if you don’t fully understand the danger. Nonetheless, it’s important not to get swept up in the crowd. I’ve been trying to get a grasp on why so many seem so hopeless, and these are the assumptions I believe they are making (trivial assumptions included, for completeness; there may be some overlap in this list):
First of all, is my list of seemingly necessary assumptions correct?
If so, it seems to me that most of these are far from proven statements of fact, and in fact are
allheavily debated. Assumption 8 in particular seems to highlight this, as if a strong enough case could be made for each of the previous assumptions, it would be fairly easy to convince most intelligent researchers, which we don’t seem to observe.A historical example which bears some similarities to the current situation may be Godel’s resolution to Hilbert's program. He was able to show unarguably that no consistent finite system of axioms is capable of proving all truths, at which point the mathematical community was able to advance beyond the limitations of early formalism. As far as I am aware, no similarly strong argument exists for even one of the assumptions listed above.
Given all of this, and the fact that there are so many uncertainties here, I don’t understand why so many researchers (most prominently Eliezer Yudkowsky, but there are countless more) seem so certain that we are doomed. I find it hard to believe that all alignment ideas presented so far show no promise, considering I’ve yet to see a slam-dunk argument presented for why even a single modern alignment proposals can’t work. (Yes, I’ve seen proofs against straw-man proposals, but not really any undertaken by a current expert in the field). This may very well be due to my own ignorance/ relative newness, however, and if so, please correct me!
I’d like to hear the steelmanned argument for why alignment is hopeless, and Yudkowsky’s announcement that “I’ve tried and couldn’t solve it” without more details doesn’t really impress me. My suspicion is I’m simply missing out on some crucial context, so consider this thread a chance to share your best arguments for AGI-related pessimism. (Later in the week I’ll post a thread from the opposite direction, in order to balance things out).
EDIT: Read the comments section if you have the time; there's some really good discussion there, and I was successfully convinced of a few specifics that I'm not sure how to incorporate into the original text. 🙃