I strong-upvoted this post.
Here's a specific, zoomed-in version of this game proposed by Nate Soares:
like, we could imagine playing a game where i propose a way that it [the AI] diverges [from POUDA-avoidance] in deployment, and you counter by asserting that there's a situation in the training data where it had to have gotten whacked if it was that stupid, and i counter either by a more-sophisticated deployment-divergence or by naming either a shallower or a factually non-[Alice]like thing that it could have learned instead such that the divergence s
Tom Davidson found a math error btw, it shouldn't be 360,000 agents doing a year's worth of thinking each in only 3 days. It should be much less than that, otherwise you are getting compute for free!
One thing conspicuously absent, IMO, is discussion of misalignment risk. I'd argue that GPT-2030 will be situationally aware, strategically aware, and (at least when plugged into fancy future versions of AutoGPT etc.) agentic/goal-directed. If you think it wouldn't be a powerful adversary of humanity, why not? Because it'll be 'just following instructions' and people will put benign instructions in the prompt? Because HFDT will ensure that it'll robustly avoid POUDA? Or will it in fact be a powerful adversary of humanity, but one that is un...
It's very possible this means we're overestimating the compute performed by the human brain a bit.
Specifically, by 6-8 OOMs. I don't think that's "a bit." ;)
Oh I totally agree with everything you say here, especially your first sentence. My timelines median for intelligence explosion (conditional on no significant government-enforced slowdown) is 2027.
So maybe I was misleading when I said I was unimpressed.
Excellent! Yeah I think GPT-4 is close to automating remote workers. 5 or 6, with suitable extensions (e.g. multimodal, langchain, etc.) will succeed I think. Of course, there'll be a lag between "technically existing AI systems can be made to ~fully automate job X" and "most people with job X are now unemployed" because things take time to percolate through the economy. But I think by the time of GPT-6 it'll be clear that this percolation is beginning to happen & the sorts of things that employ remote workers in 2023 (especially the strategically rele...
Thanks! AI managers, CEOs, self-replicators, and your-job-doers (what is your job anyway? I never asked!) seem like things that could happen before it's too late (albeit only very shortly before) so they are potential sources of bets between us. (The other stuff requires lots of progress in robotics which I don't expect to happen until after the singularity, though I could be wrong)
Yes, I understand that you don't think AGI will be achieved by brain simulation. I like that you have a giant confidence interval to account for cases where AGI is way more effi...
I think you've identified a good crux between us: I think GPT-4 is far from automating remote workers and you think it's close. If GPT-5/6 automate most remote work, that will be point in favor of your view, and if takes until GPT-8/9/10+, that will be a point in favor of mine. And if GPT gradually provides increasingly powerful tools that wildly transform jobs before they are eventually automated away by GPT-7, then we can call it a tie. :)
I also agree that the magic of GPT should update one into believing in shorter AGI timelines with lower ...
Good point. I'll message Tristan, see if he can incorporate that into the model.
Thanks for this well-researched and thorough argument! I think I have a bunch of disagreements, but my main one is that it really doesn't seem like AGI will require 8-10 OOMs more inference compute than GPT-4. I am not at all convinced by your argument that it would require that much compute to accurately simulate the human brain. Maybe it would, but we aren't trying to accurately simulate a human brain, we are trying to learn circuitry that is just as capable.
Also: Could you, for posterity, list some capabilities that you are highly confident no AI system will have by 2030? Ideally capabilities that come prior to a point-of-no-return so it's not too late to act by the time we see those capabilities.
Oh, to clarify, we're not predicting AGI will be achieved by brain simulation. We're using the human brain as a starting point for guessing how much compute AGI will need, and then applying a giant confidence interval (to account for cases where AGI is way more efficient, as well as way less efficient). It's the most uncertain part of our analysis and we're open to updating.
For posterity, by 2030, I predict we will not have:
I do like Hanson's story you link. :) Yes, panspermia possibility does make it non-crazy that there could be aliens close to us despite an empty sky. Unlikely, but non-crazy. Then there's still the question of why they are so bad at hiding & why their technology is so shitty, and why they are hiding in the first place. It's not completely impossible but it seems like a lot of implausible assumptions stacked on top of each other. So, I think it's still true that "the best modelling suggests aliens are at least hundreds of millions of light-years away."
We are more likely to be born in a world with panspermia as it has higher concentration of habitable planets.
Nice story! Mostly I think that the best AGIs will always be in the big labs rather than open source, and that current open-source models aren't smart enough to get this sort of self-improving ecosystem off the ground. But it's not completely implausible.
This being actual aliens is highly unlikely for the usual reasons. The best modeling suggests aliens are at least hundreds of millions of light-years away, since otherwise there would be sufficiently many of them in the sky that some of them would choose not to hide. Moreover if any did visit Earth with the intention of hiding, they would probably have more advanced technology than this, and would be better at hiding.
The best modeling suggests aliens are at least hundreds of millions of light-years away...
As Robin Hanson himself notes: "That's assuming independent origins. Things that have a common origin would find themselves closer in space and time." See also: https://www.overcomingbias.com/p/ufos-what-the-hellhtml
Somehow there are 4 copies of this post
I guess I just think it's pretty unreasonable to have p(doom) of 10% or less at this point, if you are familiar with the field, timelines, etc.
I totally agree the topic is important and neglected. I only said "arguably" deferrable, I have less than 50% credence that it is deferrable. As for why I'm not working on it myself, well, aaaah I'm busy idk what to do aaaaaaah! There's a lot going on that seems important. I think I've gotten wrapped up in more OAI-specific things since coming to OpenAI, and maybe that's bad & I should be stepping back and trying to go where I'm most needed even if that means leaving OpenAI. But yeah. I'm open to being convinced!
Nice post. Some minor thoughts:
Are there historical precedents for this sort of thing? Arguably so: wildfires of strategic cognition sweeping through a nonprofit or corporation or university as office politics ramps up and factions start forming with strategic goals, competing with each other. Wildfires of strategic cognition sweeping through the brain of a college student who was nonagentic/aimless before but now has bought into some ambitious ideology like EA or communism. Wildfires of strategic cognition sweeping through a network of PCs as a viru...
Something like 2% of people die every year right? So even if we ignore the value of future people and all sorts of other concerns and just focus on whether currently living people get to live or die, it would be worth delaying a year if we could thereby decrease p(doom) by 2 percentage points. My p(doom) is currently 70% so it is very easy to achieve that. Even at 10% p(doom), which I consider to be unreasonably low, it would probably be worth delaying a few years.
Re: 2: Yeah I basically agree. I'm just not as confident as you are I guess. Like, maybe the ...
Proposed Forecasting Technique: Annotate Scenario with Updates (Related to Joe's Post)
I am unimpressed. I've had conversations with people before that went very similarly to this. If this had been a transcript of your conversation with a human, I would have said that human was not engaging with the subject on the gears / object level and didn't really understand it, but rather had a shallow understanding of the topic, used the anti-weirdness heuristic combined with some misunderstandings to conclude the whole thing was bogus, and then filled in the blanks to produce the rest of the text. Or, to put it differently, BingChat's writing here re...
I don't know, I feel like the day that an AI can do significantly better than this, will be close to the final day of human supremacy. In my experience, we're still in a stage where the AIs can't really form or analyze complex structured thoughts on their own - where I mean thoughts with, say, the complexity of a good essay. To generate complex structured thoughts, you have to help them a bit, and when they analyze something complex and structured, they can make out parts of it, but they don't form a comprehensive overall model of meaning that they can the...
Science as a kind of Ouija board:
With the board, you do this set of rituals and it produces a string of characters as output, and then you are supposed to read those characters and believe what they say.
So too with science. Weird rituals, check. String of characters as output, check. Supposed to believe what they say, check.
With the board, the point of the rituals is to make it so that you aren't writing the output, something else is -- namely, spirits. You are supposed to be light and open-minded and 'let the spirit move you' rather than deliberately try ...
It's no longer my top priority, but I have a bunch of notes and arguments relating to AGI takeover scenarios that I'd love to get out at some point. Here are some of them:
Beating the game in May 1937 - Hoi4 World Record Speedrun Explained - YouTube
In this playthrough, the USSR has a brief civil war and Trotsky replaces Stalin. They then get an internationalist socialist type diplomat who is super popular with US, UK, and France, who negotiates passage of troops through their territory -- specifially, they send many many brigades of extremely low-tier troop...
(But that still leaves room for an update towards "the AI doesn't necessarily kill us, it might merely warp us, or otherwise wreck civilization by bounding us and then giving us power-before-wisdom within those bounds or or suchlike, as might be the sort of whims that rando drives shake out into", which I'll chew on.)
FWIW this is my view. (Assuming no ECL/MSR or acausal trade or other such stuff. If we add those things in, the situation gets somewhat better in expectation I think, because there'll be trades with faraway places that DO care about our CEV.)
Why is 1 important? It seems like something we can defer discussion of until after (if ever) alignment is solved, no?
2 is arguably in that category also, though idk.
Why is 1 important? It seems like something we can defer discussion of until after (if ever) alignment is solved, no?
If aging was solved or looked like it will be solved within next few decades, it would make efforts to stop or slow down AI development less problematic, both practically and ethically. I think some AI accelerationists might be motivated directly by the prospect of dying/deterioration from old age, and/or view lack of interest/progress on that front as a sign of human inadequacy/stagnation (contributing to their antipathy towards humans)....
I suggest you put this in a sequence with your other posts in this series (posts making fairly basic points that nonetheless need to be said)
I normally am all for charitability and humility and so forth, but I will put my foot down and say that it's irrational (or uninformed) to disagree with this statement:
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
(I say uninformed because I want to leave an escape clause for people who aren't aware of various facts or haven't been exposed to various arguments yet. But for people who have followed AI progress recently and/or who have heard the standard argument...
? The people viewing AI as not an X-risk are the people confidently dismissing something.
I think the evidence is really there. Again, the claim isn't that we are definitely doomed, it's that AGI poses an existential risk to humanity. I think it's pretty unreasonable to disagree with that statement.
What about "Deniers?" as in, climate change deniers.
Too harsh maybe? IDK, I feel like a neutral observer presented with a conflict framed as "Doomers vs. Deniers" would not say that "deniers" was the harsher term.
Thanks to you likewise!
On doom through normal means: "Persuasion, hacking, and warfare" aren't by themselves doom, but they can be used to accumulate lots of power, and then that power can be used to cause doom. Imagine a world in which human are completely economically, militarily, and politically obsolete, thanks to armies of robots directed by superintelligent AIs. Such a world could and would do very nasty things to humans (e.g. let them all starve to death) unless the superintelligent AIs managing everything specifically cared about keeping humans ali...
Thanks for this comment. I'd be generally interested to hear more about how one could get to 20% doom (or less).
The list you give above is cool but doesn't do it for me; going down the list I'd guess something like:
1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren't out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from.
2. 5% likely? Would want to think about this more. I could imagine myself be...
Thanks, & thanks for putting in your own perspective here. I sympathize with that too; fwiw Vladimir_Nesov's answer would have satisfied me, because I am sufficiently familiar with what the terms mean. But for someone new to those terms, they are just unexplained jargon, with links to lots of lengthy but difficult to understand writing. (I agree with Richard's comment nearby). Like, I don't think Vladimir did anything wrong by giving a jargon-heavy, links-heavy answer instead of saying something like "It may be hard to construct a utility function that...
Here's where I think the conversation went off the rails. :( I think what happened is M.Y.Zuo's bullshit/woo detector went off, and they started asking pointed questions about the credentials of Critch and his ideas. Vlad and LW more generally react allergically to arguments from authority/status, so downvoted M.Y.Zuo for making this about Critch's authority instead of about the quality of his arguments.
Personally I feel like this was all a tragic misunderstanding but I generally side with M.Y.Zuo here -- I like Critch a lot as a person & I think he's ...
I appreciate the attempt at diagnosing what went wrong here. I agree this is ~where it went off the rails, and I think you are (maybe?) correctly describing what was going on from M.Y. Zou's perspective. But this doesn't feel like it captured what I found frustrating.
What feels wrong to me about this is that, for the question of:
How would one arrive at a value system that supports the latter but rejects the former?
it just doesn't make sense to me to be that worried about either authority or rigor. I think the nonrigorous concept, general...
I was one of the people who upvoted but disagreed -- I think it's a good point you raise, M. Y. Zuo, that So8res' qualifications blunt the blow and give people an out, a handy rationalization to justify continuing working on capabilities. However, there's still a non-zero (and I'd argue substantial) effect remaining.
Makes sense. I had basically decided by 2021 that those good futures (1) and (2) were very unlikely, so yeah.
Whereas my timelines views are extremely well thought-through (relative to most people that is) I feel much more uncertain and unstable about p(doom). That said, here's why I updated:
Hinton and Bengio have come out as worried about AGI x-risk; the FLI letter and Yudkowsky's tour of podcasts, while incompetently executed, have been better received by the general public and elites than I expected; the big labs (especially OpenAI) have reiterated that superintelligent AGI is a thing, that it might come soon, that it might kill everyone, and that regulation is...
I'd be curious to hear more about this "contributes significantly in expectation" bit. Like, suppose I have some plan that (if it doesn't work) burns timelines by X, but (if it does work) gets us 10% of the way towards aligned AGI (e.g. ~10 plans like this succeeding would suffice to achieve aligned AGI) and moreover there's a 20% chance that this plan actually buys time by providing legible evidence of danger to regulators who then are more likely to regulate and more likely to make the regulation actually useful instead of harmful. So we have these three...
I'm trying to make a basic point here, that pushing the boundaries of the capabilities frontier, by your own hands and for that direct purpose, seems bad to me. I emphatically request that people stop doing that, if they're doing that.
I am not requesting that people never take any action that has some probability of advancing the capabilities frontier. I think that plenty of alignment research is potentially entangled with capabilities research (and/or might get more entangled as it progresses), and I think that some people are making the tradeoffs in ways...
Great post, will buy the book and take a look!
I feel like I vaguely recall reading somewhere that some sort of california canvassing to promote gay rights experiment either didn't replicate or turned out to be outright fraud. Wish I could remember the details. It wasn't the experiment you are talking about though hopefully?
I agree that logodds space is the right way to think about how close probabilities are. However, my epistemic situation right now is basically this:
"It sure seems like Doom is more likely than Safety, for a bunch of reasons. However, I feel sufficiently uncertain about stuff, and humble, that I don't want to say e.g. 99% chance of doom, or even 90%. I can in fact imagine things being OK, in a couple different ways, even if those ways seem unlikely to me. ... OK, now if I imagine someone having the flipped perspective, and thinking that things being OK is m...
Thanks for this post! I definitely disagree with you about point I (I think AI doom is 70% likely and I think people who think it is less than, say, 20% are being very unreasonable) but I appreciate the feedback and constructive criticism, especially section III.
If you ever want to chat sometime (e.g. in a comment thread, or in a video call) I'd be happy to. If you are especially interested I can reply here to your object-level arguments in section I. I guess a lightning version would be "My arguments for doom don't depend on nanotech or anything possibly-...
At the time of writing, the Metaculus community predicts that in July 2024 there will be a 25% chance of a system of Loebner-silver-prize capability (along with the other resolution criteria). It is hard for me to imagine how this could happen.
Focus on imagining how we could get complete AI software R&D automation by then, that's both more important than Loebner-silver-prize capability and implies it (well, it implies that after a brief period sometimes called "intelligence explosion" the capability will be reached).
Not as far as I know, but people should definitely do that!
I think you are overestimating how aligned these models are right now, and very much overestimating how aligned they will be in the future absent massive regulations forcing people to pay massive alignment taxes. They won't be aligned to any users, or any corporations either. Current methods like RLHF will not work on situationally aware, agentic AGIs.
I agree that IF all we had to do to get alignment was the sort of stuff we are currently doing, the world would be as you describe. But instead there will be a significant safety tax.
Ahhh, I see. I think that's a bit misleading, I'd say "You have to care about what happens far away," e.g. you have to want there to be paperclips far away also. (The current phrasing makes it seem like a paperclipper wouldn't want to do ECL)
Also, technically, you don't actually have to care about what happens far away either, if anthropic capture is involved.
Wait, why is ECL lumped under Correlation + Kindness instead of just Correlation? I think this thread is supposed to answer that question but I don't get it.
It's not true that you only have an ECL reason to cooperate if you care about the survival of other agents. Paperclippers, for example, have ECL reason to cooperate.
What's your response to my "If I did..." point? If we include all the data points, the correlation between intelligence and agency is clearly positive, because rocks have 0 intelligence and 0 agency.
If you agree that agency as I've defined it in that sequence is closely and positively related to intelligence, then maybe we don't have anything else to disagree about. I would then ask of you and Boaz what other notion of agency you have in mind, and encourage you to specify it to avoid confusion, and then maybe that's all I'd say since maybe we'd be in agree...
First of all, I think the "cooperate together" thing is a difficult problem and is not solved by ensuring value diversity (though, note also that ensuring value diversity is a difficult task that would require heavy regulation of the AI industry!)
More importantly though, your analysis here seems to assume that the "Safety Tax" or "Alignment Tax" is zero. That is, it assumes that making an AI aligned to a particular human or group of humans (so that they can be said to "have" the AI, in the manner you described) is easy, a trivial additional step beyond mak...
Thanks! Duly noted, thanks for the feedback. I agree that political art is typically awful. FWIW, The Treacherous Turn was approximately 100% optimized to be fun. We all knew from the beginning that it wouldn't be useful if it wasn't fun. We did put some thought into making it realistic, but the realism IMO adds to the fun rather than subtracts; I would still have included it even if I didn't care about impact at all.
Nice post! You seem like you know what you are doing. I'd be curious to hear more about what you think about these priority areas, and why interpretability didn't make the list:
- ensuring ontological robustness of goals and concepts (without the use of formally specified goals)
- creating formally specified outer aligned goals
- preventing inner misalignment
- understanding mesaoptimizers better by understanding agency
Thanks and good luck!
How long do you think such a moratorium would last?
Thanks for the feedback, I'll try to keep this in mind in the future. I imagine you'd prefer me to keep the links, but make the text use common-sense language instead of acronyms so that people don't need to click on the links to understand what I'm saying?