All of Daniel Kokotajlo's Comments + Replies

There is a spectrum between AGI that is "single monolithic agent" and AGI that is not. I claim that the current state of AI as embodied by e.g. GPT-4 is already closer to the single monolithic agent end of the spectrum than someone reading CAIS in 2019 and believing it to be an accurate forecast would have expected, and that in the future things will probably be even more in that direction.

Remember, it's not like Yudkowsky was going around saying that AGI wouldn't be able to copy itself. Of course it would. It was always understood that "the AI takes over ... (read more)

2Matthew Barnett5h
I think many of the points you made are correct. For example I agree that the fact that all the instances of ChatGPT are copies of each other is a significant point against Drexler's model. In fact this is partly what my post was about. I disagree that you have demonstrated the claim in question: that we're trending in the direction of having a single huge system that acts as a unified entity. It's theoretically possible that we will reach that destination, but GPT-4 doesn't look anything like that right now. It's not an agent that plots and coordinates with other instances of itself to achieve long-term goals. It's just a bounded service, which is exactly what Drexler was talking about. Yes, GPT-4 is a highly general service that isn't very modular. I agree that's a point against Drexler, but that's also not what I was disputing.

I keep finding myself linking to this 2017 Yudkowsky facebook post so I'm putting it here so it's easy to find:

 

Eliezer (6y, via fb):

So what actually happens as near as I can figure (predicting future = hard) is that somebody is trying to teach their research AI to, god knows what, maybe just obey human orders in a safe way, and it seems to be doing that, and a mix of things goes wrong like:

The preferences not being really readable because it's a system of neural nets acting on a world-representation built up by other neural nets, parts of the system

... (read more)

Drexler can be forgiven for not talking about foundation models in his report. His report was published at the start of 2019, just months after the idea of "fine-tuning" was popularized in the context of language models, and two months before GPT-2 came out. And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity. And we're still using deep learning as Drexler foresaw, rather than building general intellige

... (read more)
7Matthew Barnett6h
I don't see what about that 2017 Facebook comment from Yudkowsky you find particularly prophetic. Is it the idea that deep learning models will be opaque? But that was fairly obvious back then too. I agree that Drexler likely exaggerated how transparent a system of AI services would be, so I'm willing to give Yudkowsky a point for that. But the rest of the scenario seems kind of unrealistic as of 2023. Some specific points: * The recursive self-improvement that Yudkowsky talks about in this scenario seems too local. I think AI self-improvement will most likely take the form of AIs assisting AI researchers, with humans gradually becoming an obsolete part of the process, rather than a single neural net modifying parts of itself during training. * The whole thing about spinning off subagents during training just doesn't seem realistic in our current paradigm. Maybe this could happen in the future, but it doesn't look "prophetic" to me. * The idea that models will have "a little agent inside plotting" that takes over the whole system still seems totally speculative to me, and I haven't seen any significant empirical evidence that this happens during real training runs. * I think gradient descent will generally select pretty hard for models that do impressive things, making me think it's unlikely that AIs will naturally conceal their abilities during training. Again, this type of stuff is theoretically possible, but it seems very hard to call this story prophetic.
6Matthew Barnett8h
GPT-4 is certainly more general than what existed years ago. Why is it more unified? When I talked about "one giant system" I meant something like a monolithic agent that takes over humanity. If GPT-N takes over the world, I expect it will be because there are millions of copies that band up together in a coalition, not because it will be a singular AI entity. Perhaps you think that copies of GPT-N will coordinate so well that it's basically just a single monolithic agent. But while I agree something like that could happen, I don't think it's obvious that we're trending in that direction. This is a complicated question that doesn't seem clear to me given current evidence.

No like, what exactly do you mean by 25:1 to 200:1 odds? Who pays who what, when? Sorry if I'm being dumb here. Normally when I make bets like this, it looks something like what I proposed. The reason being that if I win the bet, money will be almost useless to me, so it only makes sense (barely) for me to do it if I get paid up front, and then pay back with interest later.

As for definition of singularity, look, you'll know if it's happened if it happens, that's why I'm happy to just let you be the judge on Jan 1 2030. This is a bit favorable to you but that's OK by me.

1M. Y. Zuo3h
Here's a thoroughly explained and very recent example that made it to the front page: https://www.lesswrong.com/posts/t5W87hQF5gKyTofQB/ufo-betting-put-up-or-shut-up [https://www.lesswrong.com/posts/t5W87hQF5gKyTofQB/ufo-betting-put-up-or-shut-up] After reading that, including the comments, do you still have any confusion?

Possibly! I'm not sure if I understand this comment though. Could you propose a bet/deal then?

1M. Y. Zuo12h
I assume you mean 'propose terms of the bet/deal'?  (Because otherwise that is my first comment.) If so, what's the broadest possible definition of 'singularity' that your willing to accept on a 25:1 odds basis?  i.e. the definition that has to be met in order for a 'singularity' to unambiguously qualify, in your view, as having occurred by Jan 1, 2030

Sounds good, thank you! Emailing the receipt would be nice.

4Andy_McKenzie17h
Sounds good, can't find your email address, DM'd you. 

Send me $1000 now, I'll send you $1,020+interest in January 2030, where interest is calculated to match whatever I would have gotten by keeping my $1,020 in the S&P 500 the whole time?

(Unless you voluntarily forfeit by 2030, having judged that I was right.)

2M. Y. Zuo21h
Did you misread my comment?  I specified 25:1 to 200:1 odds, depending on the terms. The implication is that terms more favourable to me will be settled closer to 25:1 and terms more favourable to you will be settled closer to 200:1. i.e. $25k:$1k to $200k:$1k. '$1020+interest" would be 1.02+interest:1 odds.

Thanks for the reply. I'm a bit over my head here but isn't this a problem for the practicality of this approach? We only get mutual cooperation because all of the agents have the very unusual property that they'll cooperative if they find a proof that there is no such argument. Seems like a selfless and self-destructive property to have in most contexts, why would an agent self-modify into creating and maintaining this property?

5James Payor2d
(Thanks also to you for engaging!) Hm. I'm going to take a step back, away from the math, and see if that makes things less confusing. Let's go back to Alice thinking about whether to cooperate with Bob. They both have perfect models of each other (perhaps in the form of source code). When Alice goes to think about what Bob will do, maybe she sees that Bob's decision depends on what he thinks Alice will do. At this junction, I don't want Alice to "recurse", falling down the rabbit hole of "Alice thinking about Bob thinking about Alice thinking about--" and etc. Instead Alice should realize that she has a choice to make, about who she cooperates with, which will determine the answers Bob finds when thinking about her. This manouvre is doing a kind of causal surgery / counterfactual-taking. It cuts the loop by identifying "what Bob thinks about Alice" as a node under Alice's control. This is the heart of it, and imo doesn't rely on anything weird or unusual.

So... it's part of the setup that all of these agents will:
--Cooperate if they can prove that there is some argument compelling to everyone that everyone cooperates (because then they prove that everyone cooperates, and that includes them, and their proof system isn't mistaken?)
--Cooperate if they can prove that there is no such argument.
--Else defect.

Am I getting that right?

5James Payor2d
For the setup □(□E→E)→E, it's bit more like: each member cooperates if they can prove that a compelling argument for "everyone cooperates" is sufficient to ensure "everyone cooperates". Your second line seems right though! If there were provably no argument for straight up "everyone cooperates", i.e. □(□E→⊥), this implies □(□E→E) and therefore E, a contradiction. -- Also I think I'm a bit less confused here these days, and in case it helps: Don't forget that "□P" means "a proof of any size of P", which is kinda crazy, and can be responsible for things not lining up with your intuition. My hot take is that Lob's theorem / incompleteness says "with finite proof strength you can only deny proofs up to a limited size, on pain of diagonalization". Which is way saner than the usual interpretation! So idk, especially in this context I think it's a bad idea to throw out your intuition when the math seems to say something else. Since the mismatch is probably coming down to some subtlety in this formalization of provability/meta-methamatics. And I presently think the quirky nature of provability logic is often bugs due to bad choices in the formalism.

I'm glad I asked, that was helpful! I agree that instrumental convergence is a huge crux; if I were convinced that e.g. it wasn't going to happen until 15 years from now, and/or that the kinds of systems that might instrumentally converge were always going to be less economically/militarily/etc. competitive than other kinds of systems, that would indeed be a huge revolution in my thought and would completely change the way I think about AI and AI risks, and I'd become much more optimistic.

I'll go read the post you linked.

1Noosphere893d
I'd especially read footnote 3, because it gave me a very important observation for why instrumental convergence is actually bad for capabilities, or at least not obviously good for capabilities and incentivized, especially with a lot of space to roam:

Well yeah, it depends on details and assumptions I didn't make explicit -- I wrote only four sentences!

If you have counterarguments to any of my claims I'd be interested to hear them, just in case they are new to me.

My biggest counterargument to the case that AI progress should be slowed down comes from an observation made by porby about a fundamental lack of a property we theorize about AI systems, and the one foundational assumption around AI risk:

Instrumental convergence, and it's corollaries like powerseeking.

The important point is that current and most plausible future AI systems don't have incentives to learn instrumental goals, and the type of AI that has enough space and has very few constraints, like RL with sufficiently unconstrained action spaces to learn i... (read more)

Even if you buy the dial theory, it still doesn't make sense to shout Yay Progress on the topic of AGI. Singularity is happening this decade, maybe next, whether we shout Yay or Boo. Shouting Boo just delays it a little and makes it more likely to be good instead of bad. (Currently is it quite likely to be bad).

1M. Y. Zuo1d
I'd be willing to bet that the singularity is not happening this decade at up to $1k USD at 25:1 to 200:1 odds, depending on the terms.

Consider that not everyone shares your view that the Singularity is happening soon, or that it will be better if delayed.

9Gerald Monroe4d
There are also more than 1 dial and if one party turns theirs up enough, it's a choice between "turn yours up or lose".  Historical examples such as the outcomes for China during the Opium wars are what happens when you restrict progress.  China did exactly what Zvi is talking about - they had material advantages and had gunpower approximately 270 years!!! before the Europeans first used it.  Later on, it did not go well for them. The relative advantage of having AGI when other parties don't is exponential, not linear.  For example, during the Opium Wars, the Chinese had ships with cannon and were not outnumbered thousands to 1.  A party with AGI and exponential numbers of manufacturing and mining robots would allow someone to produce easily thousands the possible industrial output of other countries during wartime, and since each vehicle is automated there is no bottleneck of pilots or crew.   To prove there is more than 1 dial : when the USA delays renewable energy projects by an average wait time of 4 years! [https://www.energy.gov/eere/i2x/articles/tackling-high-costs-and-long-delays-clean-energy-interconnection], and has arbitrarily and capriciously decided to close applications [https://www.nrdc.org/bio/dana-ammann/breaking-through-pjm-interconnection-queue-crisis] for consideration (rather than do the sensible thing and streamline the review process),  China is making it happen [https://www.reuters.com/world/china/china-solar-power-capacity-could-post-record-growth-2023-2023-02-16/].   Others on lesswrong have posted the false theory that China is many years behind the AI race, when in reality the delay is about a year [https://github.com/InternLM/InternLM-techreport].   Note that in worlds with AI delays that were coordinated with China somehow, there are additional parties who could potentially take advantage of the delay, as well as the obvious risk of defection.  AGI is potentially far more useful and powerful than nuclear weapons ever were, and also
4Noosphere894d
I wouldn't be nearly as confident as a lot of LWers here, and in particular I suspect this depends on some details and assumptions that aren't made explicit here.

How about this:
--Re the first grey area: We rule in your favor here.
--Re the second grey area: You decide, in 2027, based on your own best judgment, whether or not it would have happened absent regulation. I can disagree with your judgment, but I still have to agree that you won the bet (if you rule in your favor).

4Andy_McKenzie18h
Those sound good to me! I donated to your charity (the Animal Welfare Fund) to finalize it. Lmk if you want me to email you the receipt. Here's the manifold market:  Bet Andy will donate $50 to a charity of Daniel's choice now. If, by January 2027, there is not a report from a reputable source confirming that at least three companies, that would previously have relied upon programmers, and meet a defined level of success, are being run without the need for human programmers, due to the independent capabilities of an AI developed by OpenAI or another AI organization, then Daniel will donate $100, adjusted for inflation as of June 2023, to a charity of Andy's choice. Terms Reputable Source: For the purpose of this bet, reputable sources include MIT Technology Review, Nature News, The Wall Street Journal, The New York Times, Wired, The Guardian, or TechCrunch, or similar publications of recognized journalistic professionalism. Personal blogs, social media sites, or tweets are excluded.  AI's Capabilities: The AI must be capable of independently performing the full range of tasks typically carried out by a programmer, including but not limited to writing, debugging, maintaining code, and designing system architecture. Equivalent Roles: Roles that involve tasks requiring comparable technical skills and knowledge to a programmer, such as maintaining codebases, approving code produced by AI, or prompting the AI with specific instructions about what code to write. Level of Success: The companies must be generating a minimum annual revenue of $10 million (or likely generating this amount of revenue if it is not public knowledge). Report: A single, substantive article or claim in one of the defined reputable sources that verifies the defined conditions. AI Organization: An institution or entity recognized for conducting research in AI or developing AI technologies. This could include academic institutions, commercial entities, or government agencies. Inflation Ad

Isn't the college student example an example of 1 and 2? I'm thinking of e.g. students who become convinced of classical utilitarianism and then join some Effective Altruist club etc.

They say "And then the entire world gets transformed as superintelligent AIs + robots automate the economy." Does Tyler Cowen buy all of that? Is that not the part he disagrees with?

And then yeah for the AI kills you part there are models as well, albeit not economic growth models because economic growth is a different subject. But there are simple game theory models, for example -- expected utility maximizer with mature technology + misaligned utility function = and then it kills you. And then there are things like Carlsmith's six-step argument and Chalmers' and so forth. What sort of thing does Tyler want, that's different in kind from what we already have?

Has Tyler Cowen heard of the Bio Anchors by Ajeya Cotra model or the takeoffspeeds.com model by Tom Davidson or Roodman's model of the singularity, or for that matter the earlier automation models by Robin Hanson? All of them seem to be the sort of thing he wants, I'm surprised he hasn't heard of them. Or maybe he has and thinks they don't count for some reason? I would be curious to know why.

2Raemon6d
I think those don’t say ‘and then the AI kills you’

Given your lack of disposable money I think this would be a bad deal for you, and as for me, it is sorta borderline (my credence that the bet will resolve in your favor is something like 40%?) but sure, let's do it. As for what charity to donate to, how about Animal Welfare Fund | Effective Altruism Funds. Thanks for working out all these details!

Here are some grey area cases we should work out:
--What if there is a human programmer managing the whole setup, but they are basically a formality? Like, the company does technically have programmers on staff but... (read more)

4Andy_McKenzie5d
Sounds good, I'm happy with that arrangement once we get these details figured out.  Regarding the human programmer formality, it seems like business owners would have to be really incompetent for this to be a factor. Plenty of managers have coding experience. If the programmers aren't doing anything useful then they will be let go or new companies will start that don't have them. They are a huge expense. I'm inclined to not include this since it's an ambiguity that seems implausible to me.  Regarding the potential ban by the government, I wasn't really thinking of that as a possible option. What kind of ban do you have in mind? I imagine that regulation of AI is very likely by then, so if the automation of all programmers hasn't happened by Jan 2027, it seems very easy to argue that it would have happened in the absence of the regulation.  Regarding these and a few of the other ambiguous things, one way we could do this is that you and I could just agree on it in Jan 2027. Otherwise, the bet resolves N/A and you don't donate anything. This could make it an interesting Manifold question because it's a bit adversarial. This way, we could also get rid of the requirement for it to be reported by a reputable source, which is going to be tricky to determine. 

I've made several bets like this in the past, but it's a bit frustrating since I don't stand to gain anything by winning -- by the time I win the bet, we are well into the singularity & there isn't much for me to do with the money anymore. What are the terms you have in mind? We could do the thing where you give me money now, and I give it back with interest later.

 

3Andy_McKenzie7d
Understandable. How about this?  Bet Andy will donate $50 to a charity of Daniel's choice now. If, by January 2027, there is not a report from a reputable source confirming that at least three companies, that would previously have relied upon programmers, and meet a defined level of success, are being run without the need for human programmers, due to the independent capabilities of an AI developed by OpenAI or another AI organization, then Daniel will donate $100, adjusted for inflation as of June 2023, to a charity of Andy's choice. Terms Reputable Source: For the purpose of this bet, reputable sources include MIT Technology Review, Nature News, The Wall Street Journal, The New York Times, Wired, The Guardian, or TechCrunch, or similar publications of recognized journalistic professionalism. Personal blogs, social media sites, or tweets are excluded. AI's Capabilities: The AI must be capable of independently performing the full range of tasks typically carried out by a programmer, including but not limited to writing, debugging, maintaining code, and designing system architecture. Equivalent Roles: Roles that involve tasks requiring comparable technical skills and knowledge to a programmer, such as maintaining codebases, approving code produced by AI, or prompting the AI with specific instructions about what code to write. Level of Success: The companies must be generating a minimum annual revenue of $10 million (or likely generating this amount of revenue if it is not public knowledge). Report: A single, substantive article or claim in one of the defined reputable sources that verifies the defined conditions. AI Organization: An institution or entity recognized for conducting research in AI or developing AI technologies. This could include academic institutions, commercial entities, or government agencies. Inflation Adjustment: The donation will be an equivalent amount of money as $100 as of June 2023, adjusted for inflation based on https://www.bls.go

Thanks for that feedback as well -- I think I didn't realize how much my comment comes across as 'debate' framing, which now on second read seems obvious. I genuinely didn't intend my comment to be a criticism of the post at all; I genuinely was thinking something like "This is a great post. But other than that, what should I say? I should have something useful to add. Ooh, here's something: Why no talk of misalignment? Seems like a big omission. I wonder what he thinks about that stuff." But on reread it comes across as more of a "nyah nyah why didn't you talk about my hobbyhorse" unfortunately.

Thanks for the feedback, I'll try to keep this in mind in the future. I imagine you'd prefer me to keep the links, but make the text use common-sense language instead of acronyms so that people don't need to click on the links to understand what I'm saying?

6FinalFormal28d
That seems like a useful heuristic- I also think there's an important distinction between using links in a debate frame and in a sharing frame. I wouldn't be bothered at all by a comment using acronyms and links, no matter how insular, if the context was just 'hey this reminds me of HDFT and POUDA,' a beginner can jump off of that and get down a rabbit hole of interesting concepts. But if you're in a debate frame, you're introducing unnecessary barriers to discussion which feel unfair and disqualifying. At its worst it would be like saying: 'youre not qualified to debate until you read these five articles.' In a debate frame I don't think you should use any unnecessary links or acronyms at all. If you're linking a whole article it should be because it's necessary for them to read and understand the whole article for the discussion to continue and it cannot be summarized. I think I have this principle because in my mind you cannot not debate so therefore you have to read all the links and content included, meaning that links in a sharing context are optional but in a debate context they're required. I think on a second read your comment might have been more in the 'sharing' frame than I originally thought, but to the extent you were presenting arguments I think you should maximize legibility, to the point of only including links if you make clear contextually or explicitly to what degree the link is optional or just for reference.

I strong-upvoted this post.

Here's a specific, zoomed-in version of this game proposed by Nate Soares

like, we could imagine playing a game where i propose a way that it [the AI] diverges [from POUDA-avoidance] in deployment, and you counter by asserting that there's a situation in the training data where it had to have gotten whacked if it was that stupid, and i counter either by a more-sophisticated deployment-divergence or by naming either a shallower or a factually non-[Alice]like thing that it could have learned instead such that the divergence s

... (read more)
3Lauro Langosco9d
I like that mini-game! Thanks for the reference

Tom Davidson found a math error btw, it shouldn't be 360,000 agents doing a year's worth of thinking each in only 3 days. It should be much less than that, otherwise you are getting compute for free!

5jsteinhardt8d
Oops, thanks, updated to fix this.

Well said. 

One thing conspicuously absent, IMO, is discussion of misalignment risk. I'd argue that GPT-2030 will be situationally aware, strategically aware, and (at least when plugged into fancy future versions of AutoGPT etc.) agentic/goal-directed. If you think it wouldn't be a powerful adversary of humanity, why not? Because it'll be 'just following instructions' and people will put benign instructions in the prompt? Because HFDT will ensure that it'll robustly avoid POUDA? Or will it in fact be a powerful adversary of humanity, but one that is un... (read more)

As I read this post, I found myself puzzled by the omission of the potential of AI-research-acceleration by SotA AI models, as Daniel mentions in his comment. I think it's worth pointing out that this has been explicitly discussed by leading individuals in the big AI labs. For instance, Sam Altman saying that scaling is no longer the primary path forward in their work, that instead algorithmic advances are. 

Think about your intuitions of what a smart and motivated human is capable of. The computations that that human brain is running represent an algo... (read more)

I don't like the number of links that you put into your first paragraph. The point of developing a vocabulary for a field is to make communication more efficient so that the field can advance. Do you need an acronym and associated article for 'pretty obviously unintended/destructive actions,' or in practice is that just insularizing the discussion?

I hear people complaining about how AI safety only has ~300 people working about it, and how nobody is developing object level understandings and everyone's thinking from authority, but the more sentences you wri... (read more)

6Not Relevant9d
Where does this “transfer learning across timespans” come from? The main reason I see for checking back in after 3 days is the model’s losing the thread of what the human currently wants, rather than being incapable of pursuing something for longer stretches. A direct parallel is a human worker reporting to a manager on a project - the worker could keep going without check-ins, but their mental model of the larger project goes out of sync within a few days so de facto they’re rate limited by manager check-ins.

Blast from the past: Reading this recent paper I happened across this diagram:

3Karl von Wendt9d
Thank you! Very interesting and a little disturbing, especially the way the AI performance expands in all directions simultaneously. This is of course not surprising, but still concerning to see it depicted in this way. It's all too obvious how this diagram will look in one or two years. Would also be interesting to have an even broader diagram including all kinds of different skills, like playing games, steering a car, manipulating people, etc.

It's very possible this means we're overestimating the compute performed by the human brain a bit.


Specifically, by 6-8 OOMs. I don't think that's "a bit." ;)

Oh I totally agree with everything you say here, especially your first sentence. My timelines median for intelligence explosion (conditional on no significant government-enforced slowdown) is 2027.


So maybe I was misleading when I said I was unimpressed.

 

Excellent! Yeah I think GPT-4 is close to automating remote workers. 5 or 6, with suitable extensions (e.g. multimodal, langchain, etc.) will succeed I think. Of course, there'll be a lag between "technically existing AI systems can be made to ~fully automate job X" and "most people with job X are now unemployed" because things take time to percolate through the economy. But I think by the time of GPT-6 it'll be clear that this percolation is beginning to happen & the sorts of things that employ remote workers in 2023 (especially the strategically rele... (read more)

7Andy_McKenzie10d
I’m wondering if we could make this into a bet. If by remote workers we include programmers, then I’d be willing to bet that GPT-5/6, depending upon what that means (might be easier to say the top LLMs or other models trained by anyone by 2026?) will not be able to replace them.

Thanks! AI managers, CEOs, self-replicators, and your-job-doers (what is your job anyway? I never asked!) seem like things that could happen before it's too late (albeit only very shortly before) so they are potential sources of bets between us. (The other stuff requires lots of progress in robotics which I don't expect to happen until after the singularity, though I could be wrong)

Yes, I understand that you don't think AGI will be achieved by brain simulation. I like that you have a giant confidence interval to account for cases where AGI is way more effi... (read more)

Great points.

I think you've identified a good crux between us: I think GPT-4 is far from automating remote workers and you think it's close. If GPT-5/6 automate most remote work, that will be point in favor of your view, and if takes until GPT-8/9/10+, that will be a point in favor of mine. And if GPT gradually provides increasingly powerful tools that wildly transform jobs before they are eventually automated away by GPT-7, then we can call it a tie. :)

I also agree that the magic of GPT should update one into believing in shorter AGI timelines with lower ... (read more)

Good point. I'll message Tristan, see if he can incorporate that into the model.

5avturchin11d
Had a post [https://www.lesswrong.com/posts/bhddPYhh7LXdKXK9L/anthropic-effects-imply-that-we-are-more-likely-to-live-in] about that. 

Thanks for this well-researched and thorough argument! I think I have a bunch of disagreements, but my main one is that it really doesn't seem like AGI will require 8-10 OOMs more inference compute than GPT-4. I am not at all convinced by your argument that it would require that much compute to accurately simulate the human brain. Maybe it would, but we aren't trying to accurately simulate a human brain, we are trying to learn circuitry that is just as capable.

Also: Could you, for posterity, list some capabilities that you are highly confident no AI system will have by 2030? Ideally capabilities that come prior to a point-of-no-return so it's not too late to act by the time we see those capabilities.

3meijer197310d
AI will probably displace a lot of cognitive workers in the near future. And physical labor might take a while to get below 25$/hr. * Most most tasks human level intelligence is not required.  * Most highly valued jobs have a lot of tasks that do not require high intelligence. * Doing 95% of all tasks could be a lot sooner (10-15 years earlier) than 100%. See autonomous driving (getting to 95% safe or 99,9999 safe is a big difference). * Physical labor by robots will probably remain expensive for a long time (e.g. a robot plumber). A robot ceo is probably cheaper in the future than the robot plumber.  * Just take gpt4 and fine tune it and you can automate a lot of cognitive labor already. * Deployment of cognitve work automation (a software update) is much faster that deployment of physical robots. I agree that AI might not replace swim instructors by 2030. It is the cognitive work where the big leaps will be. 

Oh, to clarify, we're not predicting AGI will be achieved by brain simulation. We're using the human brain as a starting point for guessing how much compute AGI will need, and then applying a giant confidence interval (to account for cases where AGI is way more efficient, as well as way less efficient). It's the most uncertain part of our analysis and we're open to updating.

For posterity, by 2030, I predict we will not have:

  • AI drivers that work in any country
  • AI swim instructors
  • AI that can do all of my current job at OpenAI in 2023
  • AI that can get into a 201
... (read more)

I do like Hanson's story you link. :) Yes, panspermia possibility does make it non-crazy that there could be aliens close to us despite an empty sky. Unlikely, but non-crazy. Then there's still the question of why they are so bad at hiding & why their technology is so shitty, and why they are hiding in the first place. It's not completely impossible but it seems like a lot of implausible assumptions stacked on top of each other. So, I think it's still true that "the best modelling suggests aliens are at least hundreds of millions of light-years away."

We are more likely to be born in a world with panspermia as it has higher concentration of habitable planets.

Nice story! Mostly I think that the best AGIs will always be in the big labs rather than open source, and that current open-source models aren't smart enough to get this sort of self-improving ecosystem off the ground. But it's not completely implausible.

5Karl von Wendt10d
Thank you very much! I agree. We chose this scenario out of many possibilities because so far it hasn't been described in much detail and because we wanted to point out that open source can also lead to dangerous outcomes, not because it is the most likely scenario. Our next story will be more "mainstream".

This being actual aliens is highly unlikely for the usual reasons. The best modeling suggests aliens are at least hundreds of millions of light-years away, since otherwise there would be sufficiently many of them in the sky that some of them would choose not to hide. Moreover if any did visit Earth with the intention of hiding, they would probably have more advanced technology than this, and would be better at hiding.

The best modeling suggests aliens are at least hundreds of millions of light-years away...

As Robin Hanson himself notes: "That's assuming independent origins. Things that have a common origin would find themselves closer in space and time." See also: https://www.overcomingbias.com/p/ufos-what-the-hellhtml

I guess I just think it's pretty unreasonable to have p(doom) of 10% or less at this point, if you are familiar with the field, timelines, etc. 

I totally agree the topic is important and neglected. I only said "arguably" deferrable, I have less than 50% credence that it is deferrable. As for why I'm not working on it myself, well, aaaah I'm busy idk what to do aaaaaaah! There's a lot going on that seems important. I think I've gotten wrapped up in more OAI-specific things since coming to OpenAI, and maybe that's bad & I should be stepping back and trying to go where I'm most needed even if that means leaving OpenAI. But yeah. I'm open to being convinced!

4[comment deleted]12d
2[comment deleted]12d
2[comment deleted]12d
2Wei Dai12d
I guess part of the problem is that the people who are currently most receptive to my message are already deeply enmeshed in other x-risk work, and I don't know how to reach others for whom the message might be helpful (such as academic philosophers just starting to think about AI?). If on reflection you think it would be worth spending some of your time on this, one particularly useful thing might be to do some sort of outreach/field-building, like writing a post or paper describing the problem, presenting it at conferences, and otherwise attracting more attention to it. (One worry I have about this is, if someone is just starting to think about AI at this late stage, maybe their thinking process just isn't very good, and I don't want them to be working on this topic! But then again maybe there's a bunch of philosophers who have been worried about AI for a while, but have stayed away due to the overton window thing?)

Nice post. Some minor thoughts: 

Are there historical precedents for this sort of thing? Arguably so: wildfires of strategic cognition sweeping through a nonprofit or corporation or university as office politics ramps up and factions start forming with strategic goals, competing with each other. Wildfires of strategic cognition sweeping through the brain of a college student who was nonagentic/aimless before but now has bought into some ambitious ideology like EA or communism. Wildfires of strategic cognition sweeping through a network of PCs as a viru... (read more)

2TsviBT5d
To me a central difference, suggested by the word "strategic", is that the goal pursuit should be 1. unboundedly general, and 2. unboundedly ambitious. By unboundedly ambitious I mean "has an unbounded ambit" (ambit = "the area went about in; the realm of wandering" https://en.wiktionary.org/wiki/ambit#Etymology [https://en.wiktionary.org/wiki/ambit#Etymology] ), i.e. its goals induce it to pursue unboundedly much control over the world. By unboundedly general I mean that it's universal for optimization channels. For any given channel through which one could optimize, it can learn or recruit understanding to optimize through that channel. Humans are in a weird liminal state where we have high-ambition-appropriate things (namely, curiosity), but local changes in pre-theoretic "ambition" (e.g. EA, communism) are usually high-ambition-inappropriate (e.g. divesting from basic science in order to invest in military power or whatever).

Something like 2% of people die every year right? So even if we ignore the value of future people and all sorts of other concerns and just focus on whether currently living people get to live or die, it would be worth delaying a year if we could thereby decrease p(doom) by 2 percentage points. My p(doom) is currently 70% so it is very easy to achieve that. Even at 10% p(doom), which I consider to be unreasonably low, it would probably be worth delaying a few years.

Re: 2: Yeah I basically agree. I'm just not as confident as you are I guess. Like, maybe the ... (read more)

4Wei Dai12d
Someone with with 10% p(doom) may worry that if they got into a coalition with others to delay AI, they can't control the delay precisely, and it could easily become more than a few years. Maybe it would be better not to take that risk, from their perspective. And lots of people have p(doom)<10%. Scott Aaronson just gave 2% for example, and he's probably taken AI risk more seriously than most (currently working on AI safety at OpenAI), so probably the median p(doom) (or effective p(doom) for people who haven't thought about it explicitly) among the whole population is even lower. I think I've tried to take into account uncertainties like this. It seems that in order for my position (that the topic is important and too neglected) to be wrong, one has to reach high confidence that these kinds of problems will be easy for AIs (or humans or AI-human teams) to solve, and I don't see how that kind of conclusion could be reached today. I do have some specific arguments [https://www.lesswrong.com/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy#Replicate_the_trajectory_with_ML_] for why the AIs we'll build may be bad at philosophy, but I think those are not very strong arguments so I'm mostly relying on a prior that says we should be worried about and thinking about this until we see good reasons not to. (It seems hard to have strong arguments either way today, given our current state of knowledge about metaphilosophy and future AIs.) Another argument for my position is that humans have already created a bunch of opportunities for ourselves to make serious philosophical mistakes, like around nuclear weapons, farmed animals, AI, and we can't solve those problems by just asking smart honest humans the right questions, as there is a lot of disagreement between philosophers on many important questions. What's stopping you from doing this, if anything? (BTW, beyond the general societal level of neglect, I'm especially puzzled by the lack of interest/engagement on this

Proposed Forecasting Technique: Annotate Scenario with Updates (Related to Joe's Post)

  • Consider a proposition like "ASI will happen in 2024, not sooner, not later." It works best if it's a proposition you assign very low credence to, but that other people you respect assign much higher credence to.
  • What's your credence in that proposition?
  • Step 1: Construct a plausible story of how we could get to ASI in 2024, no sooner, no later. The most plausible story you can think of. Consider a few other ways it could happen too, for completeness, but don't write them d
... (read more)

I am unimpressed. I've had conversations with people before that went very similarly to this. If this had been a transcript of your conversation with a human, I would have said that human was not engaging with the subject on the gears / object level and didn't really understand it, but rather had a shallow understanding of the topic, used the anti-weirdness heuristic combined with some misunderstandings to conclude the whole thing was bogus, and then filled in the blanks to produce the rest of the text. Or, to put it differently, BingChat's writing here re... (read more)

I don't know, I feel like the day that an AI can do significantly better than this, will be close to the final day of human supremacy. In my experience, we're still in a stage where the AIs can't really form or analyze complex structured thoughts on their own - where I mean thoughts with, say, the complexity of a good essay. To generate complex structured thoughts, you have to help them a bit, and when they analyze something complex and structured, they can make out parts of it, but they don't form a comprehensive overall model of meaning that they can the... (read more)

Science as a kind of Ouija board:

With the board, you do this set of rituals and it produces a string of characters as output, and then you are supposed to read those characters and believe what they say.

So too with science. Weird rituals, check. String of characters as output, check. Supposed to believe what they say, check.

With the board, the point of the rituals is to make it so that you aren't writing the output, something else is -- namely, spirits. You are supposed to be light and open-minded and 'let the spirit move you' rather than deliberately try ... (read more)

It's no longer my top priority, but I have a bunch of notes and arguments relating to AGI takeover scenarios that I'd love to get out at some point. Here are some of them:

Beating the game in May 1937 - Hoi4 World Record Speedrun Explained - YouTube
In this playthrough, the USSR has a brief civil war and Trotsky replaces Stalin. They then get an internationalist socialist type diplomat who is super popular with US, UK, and France, who negotiates passage of troops through their territory -- specifially, they send many many brigades of extremely low-tier troop... (read more)

(But that still leaves room for an update towards "the AI doesn't necessarily kill us, it might merely warp us, or otherwise wreck civilization by bounding us and then giving us power-before-wisdom within those bounds or or suchlike, as might be the sort of whims that rando drives shake out into", which I'll chew on.)

FWIW this is my view. (Assuming no ECL/MSR or acausal trade or other such stuff. If we add those things in, the situation gets somewhat better in expectation I think, because there'll be trades with faraway places that DO care about our CEV.)

Why is 1 important? It seems like something we can defer discussion of until after (if ever) alignment is solved, no?

2 is arguably in that category also, though idk.

Wei Dai12dΩ91810

Why is 1 important? It seems like something we can defer discussion of until after (if ever) alignment is solved, no?

If aging was solved or looked like it will be solved within next few decades, it would make efforts to stop or slow down AI development less problematic, both practically and ethically. I think some AI accelerationists might be motivated directly by the prospect of dying/deterioration from old age, and/or view lack of interest/progress on that front as a sign of human inadequacy/stagnation (contributing to their antipathy towards humans).... (read more)

I suggest you put this in a sequence with your other posts in this series (posts making fairly basic points that nonetheless need to be said)

I normally am all for charitability and humility and so forth, but I will put my foot down and say that it's irrational (or uninformed) to disagree with this statement:

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

(I say uninformed because I want to leave an escape clause for people who aren't aware of various facts or haven't been exposed to various arguments yet. But for people who have followed AI progress recently and/or who have heard the standard argument... (read more)

4Noosphere8915d
I agree with the statement, broadly construed, so I don't disagree here. The key disanalogy between climate change and AI risk is the evidence base for both. For Climate change, there was arguably trillions to quadrillions of data points of evidence, if not more, which is easily enough to convince even very skeptical people's priors to update massively. For AI, the evidence base is closer to maybe 100 data points maximum, and arguably lower than that. This is changing for the future, and things are getting better, but it's quite different from climate change where you could call them deniers pretty matter of factly. This means more general priors matter, and even not very extreme priors wouldn't update much on the evidence for AI doom, so they are much, much less irrational compared to climate deniers If the statement is all that's being asked for, that's enough. The worry is when people apply climate analogies to the AI without realizing the differences, and those differences are enough to alter or invalidate the conclusions argued for.

? The people viewing AI as not an X-risk are the people confidently dismissing something. 

I think the evidence is really there. Again, the claim isn't that we are definitely doomed, it's that AGI poses an existential risk to humanity. I think it's pretty unreasonable to disagree with that statement.

4Noosphere8915d
The point is that the details aren't analogous to the climate change case, and while I don't agree with people who dismiss AI risk, I think that the evidence we have isn't enough to to claim anything more than AI risk is real. The details matter, and due to unique issues, it's going to be very hard to get to the level where we can confidently say that people denying AI risk is totally irrational.

What about "Deniers?" as in, climate change deniers. 

Too harsh maybe? IDK, I feel like a neutral observer presented with a conflict framed as "Doomers vs. Deniers" would not say that "deniers" was the harsher term.

3ryan_b15d
I'm not at all sure this would actually be relevant to the rhetorical outcome, but I feel like the AI-can't-go-wrong camp wouldn't really accept the "Denier" label in the same way people in the AI-goes-wrong-by-default camp accept "Doomer." Climate change deniers agree they are deniers, even if they prefer terms like skeptic among themselves. In the case of climate change deniers, the question is whether or not climate change is real, and the thing that they are denying is the mountain of measurements showing that it is real. I think what is different about the can't-go-wrong, wrong-by-default dichotomy is that the question we're arguing about is the direction of change, instead; it would be like if we transmuted the climate change denier camp into a bunch of people whose response wasn't "no it isn't" but instead was "yes, and that is great news and we need more of it." Naturally it is weird to imagine people tacitly accepting the Mary Sue label in the same way we accept Doomer, so cut by my own knife I suppose!
5Noosphere8915d
I'd definitely disagree, if only because it implies a level of evidence for the doom side that's not really there, and the evidence is a lot more balanced than in the climate case. IMO this is the problem with Zvi's attempted naming too: It incorrectly assumes that the debate on AI is so settled that we can treat people viewing AI as not an X-risk as essentially dismissible deniers/wishful thinking, and this isn't where we're at for even the better argued stuff like the Orthogonality Thesis or Instrumental Convergence, to a large extent. Having enough evidence to confidently dismiss something is very hard, much harder than people realize.

Thanks to you likewise!

On doom through normal means: "Persuasion, hacking, and warfare" aren't by themselves doom, but they can be used to accumulate lots of power, and then that power can be used to cause doom. Imagine a world in which human are completely economically, militarily, and politically obsolete, thanks to armies of robots directed by superintelligent AIs. Such a world could and would do very nasty things to humans (e.g. let them all starve to death) unless the superintelligent AIs managing everything specifically cared about keeping humans ali... (read more)

Thanks for this comment. I'd be generally interested to hear more about how one could get to 20% doom (or less).

The list you give above is cool but doesn't do it for me; going down the list I'd guess something like:
1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren't out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from.
2. 5% likely? Would want to think about this more. I could imagine myself be... (read more)

Load More