I keep finding myself linking to this 2017 Yudkowsky facebook post so I'm putting it here so it's easy to find:
...Eliezer (6y, via fb):
So what actually happens as near as I can figure (predicting future = hard) is that somebody is trying to teach their research AI to, god knows what, maybe just obey human orders in a safe way, and it seems to be doing that, and a mix of things goes wrong like:
The preferences not being really readable because it's a system of neural nets acting on a world-representation built up by other neural nets, parts of the system
...Drexler can be forgiven for not talking about foundation models in his report. His report was published at the start of 2019, just months after the idea of "fine-tuning" was popularized in the context of language models, and two months before GPT-2 came out. And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity. And we're still using deep learning as Drexler foresaw, rather than building general intellige
No like, what exactly do you mean by 25:1 to 200:1 odds? Who pays who what, when? Sorry if I'm being dumb here. Normally when I make bets like this, it looks something like what I proposed. The reason being that if I win the bet, money will be almost useless to me, so it only makes sense (barely) for me to do it if I get paid up front, and then pay back with interest later.
As for definition of singularity, look, you'll know if it's happened if it happens, that's why I'm happy to just let you be the judge on Jan 1 2030. This is a bit favorable to you but that's OK by me.
Send me $1000 now, I'll send you $1,020+interest in January 2030, where interest is calculated to match whatever I would have gotten by keeping my $1,020 in the S&P 500 the whole time?
(Unless you voluntarily forfeit by 2030, having judged that I was right.)
Thanks for the reply. I'm a bit over my head here but isn't this a problem for the practicality of this approach? We only get mutual cooperation because all of the agents have the very unusual property that they'll cooperative if they find a proof that there is no such argument. Seems like a selfless and self-destructive property to have in most contexts, why would an agent self-modify into creating and maintaining this property?
So... it's part of the setup that all of these agents will:
--Cooperate if they can prove that there is some argument compelling to everyone that everyone cooperates (because then they prove that everyone cooperates, and that includes them, and their proof system isn't mistaken?)
--Cooperate if they can prove that there is no such argument.
--Else defect.
Am I getting that right?
I'm glad I asked, that was helpful! I agree that instrumental convergence is a huge crux; if I were convinced that e.g. it wasn't going to happen until 15 years from now, and/or that the kinds of systems that might instrumentally converge were always going to be less economically/militarily/etc. competitive than other kinds of systems, that would indeed be a huge revolution in my thought and would completely change the way I think about AI and AI risks, and I'd become much more optimistic.
I'll go read the post you linked.
Well yeah, it depends on details and assumptions I didn't make explicit -- I wrote only four sentences!
If you have counterarguments to any of my claims I'd be interested to hear them, just in case they are new to me.
My biggest counterargument to the case that AI progress should be slowed down comes from an observation made by porby about a fundamental lack of a property we theorize about AI systems, and the one foundational assumption around AI risk:
Instrumental convergence, and it's corollaries like powerseeking.
The important point is that current and most plausible future AI systems don't have incentives to learn instrumental goals, and the type of AI that has enough space and has very few constraints, like RL with sufficiently unconstrained action spaces to learn i...
Even if you buy the dial theory, it still doesn't make sense to shout Yay Progress on the topic of AGI. Singularity is happening this decade, maybe next, whether we shout Yay or Boo. Shouting Boo just delays it a little and makes it more likely to be good instead of bad. (Currently is it quite likely to be bad).
Consider that not everyone shares your view that the Singularity is happening soon, or that it will be better if delayed.
How about this:
--Re the first grey area: We rule in your favor here.
--Re the second grey area: You decide, in 2027, based on your own best judgment, whether or not it would have happened absent regulation. I can disagree with your judgment, but I still have to agree that you won the bet (if you rule in your favor).
Isn't the college student example an example of 1 and 2? I'm thinking of e.g. students who become convinced of classical utilitarianism and then join some Effective Altruist club etc.
They say "And then the entire world gets transformed as superintelligent AIs + robots automate the economy." Does Tyler Cowen buy all of that? Is that not the part he disagrees with?
And then yeah for the AI kills you part there are models as well, albeit not economic growth models because economic growth is a different subject. But there are simple game theory models, for example -- expected utility maximizer with mature technology + misaligned utility function = and then it kills you. And then there are things like Carlsmith's six-step argument and Chalmers' and so forth. What sort of thing does Tyler want, that's different in kind from what we already have?
Has Tyler Cowen heard of the Bio Anchors by Ajeya Cotra model or the takeoffspeeds.com model by Tom Davidson or Roodman's model of the singularity, or for that matter the earlier automation models by Robin Hanson? All of them seem to be the sort of thing he wants, I'm surprised he hasn't heard of them. Or maybe he has and thinks they don't count for some reason? I would be curious to know why.
Given your lack of disposable money I think this would be a bad deal for you, and as for me, it is sorta borderline (my credence that the bet will resolve in your favor is something like 40%?) but sure, let's do it. As for what charity to donate to, how about Animal Welfare Fund | Effective Altruism Funds. Thanks for working out all these details!
Here are some grey area cases we should work out:
--What if there is a human programmer managing the whole setup, but they are basically a formality? Like, the company does technically have programmers on staff but...
I've made several bets like this in the past, but it's a bit frustrating since I don't stand to gain anything by winning -- by the time I win the bet, we are well into the singularity & there isn't much for me to do with the money anymore. What are the terms you have in mind? We could do the thing where you give me money now, and I give it back with interest later.
Thanks for that feedback as well -- I think I didn't realize how much my comment comes across as 'debate' framing, which now on second read seems obvious. I genuinely didn't intend my comment to be a criticism of the post at all; I genuinely was thinking something like "This is a great post. But other than that, what should I say? I should have something useful to add. Ooh, here's something: Why no talk of misalignment? Seems like a big omission. I wonder what he thinks about that stuff." But on reread it comes across as more of a "nyah nyah why didn't you talk about my hobbyhorse" unfortunately.
Thanks for the feedback, I'll try to keep this in mind in the future. I imagine you'd prefer me to keep the links, but make the text use common-sense language instead of acronyms so that people don't need to click on the links to understand what I'm saying?
I strong-upvoted this post.
Here's a specific, zoomed-in version of this game proposed by Nate Soares:
...like, we could imagine playing a game where i propose a way that it [the AI] diverges [from POUDA-avoidance] in deployment, and you counter by asserting that there's a situation in the training data where it had to have gotten whacked if it was that stupid, and i counter either by a more-sophisticated deployment-divergence or by naming either a shallower or a factually non-[Alice]like thing that it could have learned instead such that the divergence s
Tom Davidson found a math error btw, it shouldn't be 360,000 agents doing a year's worth of thinking each in only 3 days. It should be much less than that, otherwise you are getting compute for free!
Well said.
One thing conspicuously absent, IMO, is discussion of misalignment risk. I'd argue that GPT-2030 will be situationally aware, strategically aware, and (at least when plugged into fancy future versions of AutoGPT etc.) agentic/goal-directed. If you think it wouldn't be a powerful adversary of humanity, why not? Because it'll be 'just following instructions' and people will put benign instructions in the prompt? Because HFDT will ensure that it'll robustly avoid POUDA? Or will it in fact be a powerful adversary of humanity, but one that is un...
As I read this post, I found myself puzzled by the omission of the potential of AI-research-acceleration by SotA AI models, as Daniel mentions in his comment. I think it's worth pointing out that this has been explicitly discussed by leading individuals in the big AI labs. For instance, Sam Altman saying that scaling is no longer the primary path forward in their work, that instead algorithmic advances are.
Think about your intuitions of what a smart and motivated human is capable of. The computations that that human brain is running represent an algo...
I don't like the number of links that you put into your first paragraph. The point of developing a vocabulary for a field is to make communication more efficient so that the field can advance. Do you need an acronym and associated article for 'pretty obviously unintended/destructive actions,' or in practice is that just insularizing the discussion?
I hear people complaining about how AI safety only has ~300 people working about it, and how nobody is developing object level understandings and everyone's thinking from authority, but the more sentences you wri...
It's very possible this means we're overestimating the compute performed by the human brain a bit.
Specifically, by 6-8 OOMs. I don't think that's "a bit." ;)
Oh I totally agree with everything you say here, especially your first sentence. My timelines median for intelligence explosion (conditional on no significant government-enforced slowdown) is 2027.
So maybe I was misleading when I said I was unimpressed.
Excellent! Yeah I think GPT-4 is close to automating remote workers. 5 or 6, with suitable extensions (e.g. multimodal, langchain, etc.) will succeed I think. Of course, there'll be a lag between "technically existing AI systems can be made to ~fully automate job X" and "most people with job X are now unemployed" because things take time to percolate through the economy. But I think by the time of GPT-6 it'll be clear that this percolation is beginning to happen & the sorts of things that employ remote workers in 2023 (especially the strategically rele...
Thanks! AI managers, CEOs, self-replicators, and your-job-doers (what is your job anyway? I never asked!) seem like things that could happen before it's too late (albeit only very shortly before) so they are potential sources of bets between us. (The other stuff requires lots of progress in robotics which I don't expect to happen until after the singularity, though I could be wrong)
Yes, I understand that you don't think AGI will be achieved by brain simulation. I like that you have a giant confidence interval to account for cases where AGI is way more effi...
Great points.
I think you've identified a good crux between us: I think GPT-4 is far from automating remote workers and you think it's close. If GPT-5/6 automate most remote work, that will be point in favor of your view, and if takes until GPT-8/9/10+, that will be a point in favor of mine. And if GPT gradually provides increasingly powerful tools that wildly transform jobs before they are eventually automated away by GPT-7, then we can call it a tie. :)
I also agree that the magic of GPT should update one into believing in shorter AGI timelines with lower ...
Thanks for this well-researched and thorough argument! I think I have a bunch of disagreements, but my main one is that it really doesn't seem like AGI will require 8-10 OOMs more inference compute than GPT-4. I am not at all convinced by your argument that it would require that much compute to accurately simulate the human brain. Maybe it would, but we aren't trying to accurately simulate a human brain, we are trying to learn circuitry that is just as capable.
Also: Could you, for posterity, list some capabilities that you are highly confident no AI system will have by 2030? Ideally capabilities that come prior to a point-of-no-return so it's not too late to act by the time we see those capabilities.
Oh, to clarify, we're not predicting AGI will be achieved by brain simulation. We're using the human brain as a starting point for guessing how much compute AGI will need, and then applying a giant confidence interval (to account for cases where AGI is way more efficient, as well as way less efficient). It's the most uncertain part of our analysis and we're open to updating.
For posterity, by 2030, I predict we will not have:
I do like Hanson's story you link. :) Yes, panspermia possibility does make it non-crazy that there could be aliens close to us despite an empty sky. Unlikely, but non-crazy. Then there's still the question of why they are so bad at hiding & why their technology is so shitty, and why they are hiding in the first place. It's not completely impossible but it seems like a lot of implausible assumptions stacked on top of each other. So, I think it's still true that "the best modelling suggests aliens are at least hundreds of millions of light-years away."
We are more likely to be born in a world with panspermia as it has higher concentration of habitable planets.
Nice story! Mostly I think that the best AGIs will always be in the big labs rather than open source, and that current open-source models aren't smart enough to get this sort of self-improving ecosystem off the ground. But it's not completely implausible.
This being actual aliens is highly unlikely for the usual reasons. The best modeling suggests aliens are at least hundreds of millions of light-years away, since otherwise there would be sufficiently many of them in the sky that some of them would choose not to hide. Moreover if any did visit Earth with the intention of hiding, they would probably have more advanced technology than this, and would be better at hiding.
The best modeling suggests aliens are at least hundreds of millions of light-years away...
As Robin Hanson himself notes: "That's assuming independent origins. Things that have a common origin would find themselves closer in space and time." See also: https://www.overcomingbias.com/p/ufos-what-the-hellhtml
I guess I just think it's pretty unreasonable to have p(doom) of 10% or less at this point, if you are familiar with the field, timelines, etc.
I totally agree the topic is important and neglected. I only said "arguably" deferrable, I have less than 50% credence that it is deferrable. As for why I'm not working on it myself, well, aaaah I'm busy idk what to do aaaaaaah! There's a lot going on that seems important. I think I've gotten wrapped up in more OAI-specific things since coming to OpenAI, and maybe that's bad & I should be stepping back and trying to go where I'm most needed even if that means leaving OpenAI. But yeah. I'm open to being convinced!
Nice post. Some minor thoughts:
Are there historical precedents for this sort of thing? Arguably so: wildfires of strategic cognition sweeping through a nonprofit or corporation or university as office politics ramps up and factions start forming with strategic goals, competing with each other. Wildfires of strategic cognition sweeping through the brain of a college student who was nonagentic/aimless before but now has bought into some ambitious ideology like EA or communism. Wildfires of strategic cognition sweeping through a network of PCs as a viru...
Something like 2% of people die every year right? So even if we ignore the value of future people and all sorts of other concerns and just focus on whether currently living people get to live or die, it would be worth delaying a year if we could thereby decrease p(doom) by 2 percentage points. My p(doom) is currently 70% so it is very easy to achieve that. Even at 10% p(doom), which I consider to be unreasonably low, it would probably be worth delaying a few years.
Re: 2: Yeah I basically agree. I'm just not as confident as you are I guess. Like, maybe the ...
Proposed Forecasting Technique: Annotate Scenario with Updates (Related to Joe's Post)
I am unimpressed. I've had conversations with people before that went very similarly to this. If this had been a transcript of your conversation with a human, I would have said that human was not engaging with the subject on the gears / object level and didn't really understand it, but rather had a shallow understanding of the topic, used the anti-weirdness heuristic combined with some misunderstandings to conclude the whole thing was bogus, and then filled in the blanks to produce the rest of the text. Or, to put it differently, BingChat's writing here re...
I don't know, I feel like the day that an AI can do significantly better than this, will be close to the final day of human supremacy. In my experience, we're still in a stage where the AIs can't really form or analyze complex structured thoughts on their own - where I mean thoughts with, say, the complexity of a good essay. To generate complex structured thoughts, you have to help them a bit, and when they analyze something complex and structured, they can make out parts of it, but they don't form a comprehensive overall model of meaning that they can the...
Science as a kind of Ouija board:
With the board, you do this set of rituals and it produces a string of characters as output, and then you are supposed to read those characters and believe what they say.
So too with science. Weird rituals, check. String of characters as output, check. Supposed to believe what they say, check.
With the board, the point of the rituals is to make it so that you aren't writing the output, something else is -- namely, spirits. You are supposed to be light and open-minded and 'let the spirit move you' rather than deliberately try ...
It's no longer my top priority, but I have a bunch of notes and arguments relating to AGI takeover scenarios that I'd love to get out at some point. Here are some of them:
Beating the game in May 1937 - Hoi4 World Record Speedrun Explained - YouTube
In this playthrough, the USSR has a brief civil war and Trotsky replaces Stalin. They then get an internationalist socialist type diplomat who is super popular with US, UK, and France, who negotiates passage of troops through their territory -- specifially, they send many many brigades of extremely low-tier troop...
(But that still leaves room for an update towards "the AI doesn't necessarily kill us, it might merely warp us, or otherwise wreck civilization by bounding us and then giving us power-before-wisdom within those bounds or or suchlike, as might be the sort of whims that rando drives shake out into", which I'll chew on.)
FWIW this is my view. (Assuming no ECL/MSR or acausal trade or other such stuff. If we add those things in, the situation gets somewhat better in expectation I think, because there'll be trades with faraway places that DO care about our CEV.)
Why is 1 important? It seems like something we can defer discussion of until after (if ever) alignment is solved, no?
2 is arguably in that category also, though idk.
Why is 1 important? It seems like something we can defer discussion of until after (if ever) alignment is solved, no?
If aging was solved or looked like it will be solved within next few decades, it would make efforts to stop or slow down AI development less problematic, both practically and ethically. I think some AI accelerationists might be motivated directly by the prospect of dying/deterioration from old age, and/or view lack of interest/progress on that front as a sign of human inadequacy/stagnation (contributing to their antipathy towards humans)....
I suggest you put this in a sequence with your other posts in this series (posts making fairly basic points that nonetheless need to be said)
I normally am all for charitability and humility and so forth, but I will put my foot down and say that it's irrational (or uninformed) to disagree with this statement:
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
(I say uninformed because I want to leave an escape clause for people who aren't aware of various facts or haven't been exposed to various arguments yet. But for people who have followed AI progress recently and/or who have heard the standard argument...
? The people viewing AI as not an X-risk are the people confidently dismissing something.
I think the evidence is really there. Again, the claim isn't that we are definitely doomed, it's that AGI poses an existential risk to humanity. I think it's pretty unreasonable to disagree with that statement.
What about "Deniers?" as in, climate change deniers.
Too harsh maybe? IDK, I feel like a neutral observer presented with a conflict framed as "Doomers vs. Deniers" would not say that "deniers" was the harsher term.
Thanks to you likewise!
On doom through normal means: "Persuasion, hacking, and warfare" aren't by themselves doom, but they can be used to accumulate lots of power, and then that power can be used to cause doom. Imagine a world in which human are completely economically, militarily, and politically obsolete, thanks to armies of robots directed by superintelligent AIs. Such a world could and would do very nasty things to humans (e.g. let them all starve to death) unless the superintelligent AIs managing everything specifically cared about keeping humans ali...
Thanks for this comment. I'd be generally interested to hear more about how one could get to 20% doom (or less).
The list you give above is cool but doesn't do it for me; going down the list I'd guess something like:
1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren't out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from.
2. 5% likely? Would want to think about this more. I could imagine myself be...
There is a spectrum between AGI that is "single monolithic agent" and AGI that is not. I claim that the current state of AI as embodied by e.g. GPT-4 is already closer to the single monolithic agent end of the spectrum than someone reading CAIS in 2019 and believing it to be an accurate forecast would have expected, and that in the future things will probably be even more in that direction.
Remember, it's not like Yudkowsky was going around saying that AGI wouldn't be able to copy itself. Of course it would. It was always understood that "the AI takes over ... (read more)