Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Instead, it’s the point of no return—the day we AI risk reducers lose the ability to significantly reduce AI risk. This might happen years before classic milestones like “World GWP doubles in four years” and “Superhuman AGI is deployed."

The rest of this post explains, justifies, and expands on this obvious but underappreciated idea. (Toby Ord appreciates it; see quote below). I found myself explaining it repeatedly, so I wrote this post as a reference.

AI timelines often come up in career planning conversations. Insofar as AI timelines are short, career plans which take a long time to pay off are a bad idea, because by the time you reap the benefits of the plans it may already be too late. It may already be too late because AI takeover may already have happened.

But this isn’t quite right, at least not when “AI takeover” is interpreted in the obvious way, as meaning that an AI or group of AIs is firmly in political control of the world, ordering humans about, monopolizing violence, etc. Even if AIs don’t yet have that sort of political control, it may already be too late. Here are three examples: [UPDATE: More fleshed-out examples can be found in this new post.]

  1. Superhuman agent AGI is still in its box but nobody knows how to align it and other actors are going to make their own version soon, and there isn’t enough time to convince them of the risks. They will make and deploy agent AGI, it will be unaligned, and we have no way to oppose it except with our own unaligned AGI. Even if it takes years to actually conquer the world, it’s already game over.

  2. Various weak and narrow AIs are embedded in the economy and beginning to drive a slow takeoff; capabilities are improving much faster than safety/alignment techniques and due to all the money being made there’s too much political opposition to slowing down capability growth or keeping AIs out of positions of power. We wish we had done more safety/alignment research earlier, or built a political movement earlier when opposition was lower.

  3. Persuasion tools have destroyed collective epistemology in the relevant places. AI isn’t very capable yet, except in the narrow domain of persuasion, but everything has become so politicized and tribal that we have no hope of getting AI projects or governments to take AI risk seriously. Their attention is dominated by the topics and ideas of powerful ideological factions that have access to more money and data (and thus better persuasion tools) than us. Alternatively, maybe we ourselves have fallen apart as a community, or become less good at seeking the truth and finding high-impact plans.

Conclusion: We should remember that when trying to predict the date of AI takeover, what we care about is the date it’s too late for us to change the direction things are going; the date we have significantly less influence over the course of the future than we used to; the point of no return.

This is basically what Toby Ord said about x-risk: “So either because we’ve gone extinct or because there’s been some kind of irrevocable collapse of civilization or something similar. Or, in the case of climate change, where the effects are very delayed that we’re past the point of no return or something like that. So the idea is that we should focus on the time of action and the time when you can do something about it rather than the time when the particular event happens.”

Of course, influence over the future might not disappear all on one day; maybe there’ll be a gradual loss of control over several years. For that matter, maybe this gradual loss of control began years ago and continues now... We should keep these possibilities in mind as well.

[Edit: I now realize that I should distinguish between AI-induced points of no return and other points of no return. Our timelines forecasts and takeoff speeds discussions are talking about AI, so we should interpret them as being about AI-induced points of no return. Our all-things-considered view on e.g. whether to go to grad school should be informed by AI-induced-PONR timelines and also "timelines" for things like nuclear war, pandemics, etc.]

116

Ω 35

New Comment
32 comments, sorted by Click to highlight new comments since: Today at 12:37 AM

This post is making a valid point (the time to intervene to prevent an outcome that would otherwise occur, is going to be before the outcome actually occurs), but I'm annoyed with the mind projection fallacy by which this post seems to treat "point of no return" as a feature of the territory, rather than your planning algorithm's map.

(And, incidentally, I wish this dumb robot cult still had a culture that cared about appreciating cognitive algorithms as the common interest of many causes, such that people would find it more natural to write a post about "point of no return"-reasoning as a general rationality topic that could have all sorts of potential applications, rather than the topic specifically being about the special case of the coming robot apocalypse. But it's probably not fair to blame Kokotajlo for this.)

The concept of a "point of no return" only makes sense relative to a class of interventions. A 1 kg ball is falling at 9.8 m/s². When is the "point of no return" at which the ball has accelerated enough such that it's no longer possible to stop it from hitting the ground?

The problem is underspecified as stated. If we add the additional information that your means of intervening is a net that can only trap objects falling with less than X kg⋅m/s² of force, then we can say that the point of no return happens at X/9.8 seconds. But it would be weird to talk about "the second we ball risk reducers lose the ability to significantly reduce the risk of the ball hitting the ground" as if that were an independent pre-existing fact that we could use to determine how strong of a net we need to buy, because it depends on the net strength.

Thanks! I think I agree with everything you say here except that I'm not annoyed. (Had I been annoyed by my own writing, I would have rewritten it...) Perhaps I'm not annoyed because while my post may have given the misleading impression that PONR was an objective fact about the world rather than a fact about the map of some agent or group of agents, I didn't fall for that fallacy myself.

To be fair to my original post though, I did make it clear that the PONR is relative to a "we," a group of people (or even a single person) with some amount of current influence over the future that could diminish to drastically less influence depending on how events go.

while my post may have given the misleading impression [...] I didn't fall for that fallacy myself.

I reach for this "bad writing" excuse sometimes, and sometimes it's plausible, but in general, I'm wary of the impulse to tell critics after the fact, "I agree, but I wasn't making that mistake," because I usually expect that if I had a deep (rather than halting, fragmentary, or inconsistent) understanding of the thing that the critic was pointing at, I would have anticipated the criticism in advance and produced different text that didn't provide the critic with the opportunity, such that I could point to a particular sentence and tell the would-be critic, "Didn't I already adequately address this here?"

Doesn't the first sentence

Instead, it’s the point of no return—the day we AI risk reducers lose the ability to significantly reduce AI risk.

address this by explaining PONR as our ability to do something?

(I mean I agree that finding oneself reaching for a bad writing excuse is a good clue that there's something you can clarify for yourself further; just, this post doesn't seem like a case of that.)

(Thanks for this—it's important that critiques get counter-critiqued, and I think that process is stronger when third parties are involved, rather than it just being author vs. critic.)

The reason that doesn't satisfy me is because I expect the actual calculus of "influence" and "control" in real-world settings to be sufficiently complicated that there's probably not going to be any usefully identifiable "point of no return". On the contrary, if there were an identifiable PONR as a natural abstraction, I think that would be a surprising empirical fact about the world in demand of deeper explanation—that the underlying calculus of influence would just happen to simplify that way, such that you could point to an event and say, "There—that's when it all went wrong", rather than there just being (say) a continuum of increasingly detailed possible causal graphs that you can compute counterfactuals with respect to (with more detailed graphs being more expensive to learn but granting more advanced planning capabilities).

If you're pessimistic about alignment—and especially if you have short timelines like Daniel—I think most of your point-of-no-return-ness should already be in the past. When, specifically? I don't see any reason to expect there to be a simple answer. You lost some measure when OpenAI launched; you lost some measure when Norbert Weiner didn't drop everything to work on the alignment problem in 1960; you lost some measure when Samuel Butler and Charles Babbage turned out to not be the same person in our timeline; you lost some measure when the ancient Greeks didn't discover natural selection ...

The post does have a paragraph mentioning continuous loss of influence and already-lost influence in the past ("Of course, influence over the future might not disappear all on one day ..."), but the reason this doesn't satisfy me as a critic is because it seems to be treated as an afterthought ("We should keep these possibilities in mind as well"), rather than being the underlying reality to which any putative "PONR" would be a mere approximation. Instead, the rhetorical emphasis is on PONR as if it were an event: "The Date of AI Takeover Is Not the Day the AI Takes Over". (And elsewhere, Daniel writes about "PONR-inducing tasks".)

But in my philosophy, "the date" and "the day" of the title are two very different kinds of entities that are hard to talk about in the same sentence. The day AI takes over actually is a physical event that happens on some specific, definite date: nanobots disassemble the Earth, or whatever. That's not subjective; the AI historian-subprocess of the future will record a definitive timestamp of when it happened. In contrast, "the date" of PONR is massively "subjective" depending on further assumptions; the AI historian-subprocesses of the future will record some sort of summary of the decision-relevant results of a billion billion ancestor simulations, but the answer is not going to fit in a 64-bit timestamp.

Maybe to Daniel, this just looks like weirdly unmotivated nitpicking ("not super relevant to the point [he] was trying to make")? But it feels like a substantive worldview difference to me.

I've read this twice and I'm still not sure whether I actually get your critique. My guess is you're saying something like:

Daniel is too much taking the PONR as a thing; this leads him to both accidentally treat PONR as a specific point in time, and also to [?? mistake planning capability for "objective" feasibility ??]

I agree that the OP's talking of PONR as a point in time doesn't make sense; a charitable read is that it's a toy model that's supposed to help make it more clear what the difference is between our ability to prevent X and X actually happening (like in the movie Armageddon; did we nuke the asteroid soon enough for it to miss Earth vs. has the asteroid actually impacted Earth). I agree that asking about "our planning capability" is vague and gives different answers depending on what counterfactuals you're using; in an extreme case of "what could we feasibly do", there's basically no PONR because we always "could" just sit down at a computer and type in a highly speed-prior-compressed source code of an FAI.

the AI historian-subprocesses of the future will record some sort of summary of the decision-relevant results of a billion billion ancestor simulations, but the answer is not going to fit in a 64-bit timestamp.

It won't be a timestamp, but it will contain information about humans's ability to plan. To extract useful lessons from its experience with coming into power surrounded by potentially hostile weak AGIs, a superintelligence has to compare its own developing models across time. It went from not understanding its situation and not knowing what to do to take control from the humans, to yes understanding and knowing, and along the way it was relevantly uncertain about what the humans were able to do.

Anyway, the above feels like it's sort of skew to the thrust of the OP, which I think is: "notice that your feasible influence will decrease well before the AGI actually kills you with nanobots, so planning under a contrary assumption will produce nonsensical plans". Maybe I'm just saying, yes it's subjective how much we're doomed at a given point, and yes we want our reasoning to be in a sense grounded in stuff actually happening, but also in order to usefully model in more detail what's happening and what plans will work, we have to talk about stuff that's intermediate in time and in abstraction between the nanobot end of the word, and the here-and-now. The intermediate stuff then says more specific stuff about when and how much influence you're losing or gaining.

I don't think we disagree about anything substantive, and I don't expect Daniel to disagree about anything substantive after reading this. It's just—

I agree that the OP's talking of PONR as a point in time doesn't make sense; a charitable read is that [...]

I don't think we should be doing charitable readings at yearly review time! If an author uses a toy model to clarify something, we want the post to say "As a clarifying toy model [...]" rather than making the readers figure it out.

If you're pessimistic about alignment—and especially if you have short timelines like Daniel—I think most of your point-of-no-return-ness should already be in the past.

I unfortunately was not clear about this, but I meant to define it in such a way that this is false by definition -- "loss of influence" is defined relative to the amount of influence we currently have. So even if we had a lot more influence 5 years ago, the PONR is when what little influence we have left mostly dries up. :)

I don't think we should be doing charitable readings at yearly review time! If an author uses a toy model to clarify something, we want the post to say "As a clarifying toy model [...]" rather than making the readers figure it out.

If by some chance this post does make it to further stages of the review, I will heavily edit it, and I'm happy to e.g. add in "As a clarifying toy model..." among other changes.

Perhaps I should clarify then that I don't actually think my writing was bad. I don't think it was perfect, but I don't think the post would have been significantly improved by me having a paragraph or two about how influence (and thus point-of-no-return) is a property of the map, not the territory. I think most readers, like me, knew that already. At any rate it seems not super relevant to the point I was trying to make.

You can steer a bit away from catastrophe today. Tomorrow you will be able to do less. After years and decades go by, you will have to be miraculously lucky or good to do something that helps. At some point, it's not the kind of "miraculous" you hope for, it's the kind you don't bother to model.

Today you are blind, and are trying to shape outcomes you can't see. Tomorrow you will know more, and be able to do more. After years and decades, you might know enough about the task you are trying to accomplish to really help. Hopefully the task you find yourself faced with is the kind you can solve in time.

If AI is un-alignable (or at least significantly easier to create than to prevent dis-alignment), the point of no return was 1837.  If Babbage had kept his mouth shut, maybe we could have avoided this path.  

But really, it's a mistake to think of it as a single point in time.  There's a slew of contributing factors, happening over a long time period.  It's somewhat similar to recent discussions about human revolutions (https://www.lesswrong.com/posts/osYFcQtxnRKB4F4HA/a-tale-from-communist-china and others).  It happens slowly, then quickly, and the possible interventions are very unclear at any point. 

career plans which take a long time to pay off are a bad idea, because by the time you reap the benefits of the plans it may already be too late

This is true, even if AI takeover never happens.  The environment changes significantly over a human lifetime, and the only reasonable strategy is to thread a path that has BOTH long-term impact (to the extent that you can predict anything) AND short-term satisfaction.  “Find a job you enjoy doing, and you will never have to work a day in your life.” remains solid advice, regardless of reasons for uncertainty.

What does "inherently a weapons technology" mean? Given some technology, how does one determine whether or not it is "inherently a weapons technology"?

I ask because it seems to me that AI is clearly not "inherently a weapons technology" as I would use those words, and I suspect you mean something different by them.

Regardless, any generalization of AI that includes (e.g.) pointed sticks and flint arrowheads is surely too broad for present purposes; even if "how do we stop humans screwing everything up with whatever tools they have available?" is a more important question than "how do we stop AIs screwing up in ways that their makers and owners would be horrified by?", it's a different question, with (probably) different answers, and the latter is the subject here.

I don't agree with your answer to your rhetorical question. A kitchen knife can cause injury and death pretty easily, but while it can be a weapon I wouldn't say that kitchen knives are "inherently a weapons technology". A brick can cause injury and death pretty easily too, and bricks are certainly not "inherently a weapons technology".

I would only say that something is "inherently a weapons technology" if (1) a major motivation for its development is (broadly speaking) military and/or (2) what it's best at is causing injury, destruction and death.

Military organizations have put quite a lot of effort into AI, but so have plenty of non-military organizations and it looks to me as if the latter have had much more (visible) success than the former. And so far, the things AI has proven most useful for are things like distinguishing cats from dogs, translating text, and beating humans at board games. Those (or things like them) may well have military applications, but they aren't weapons. (Not even when applied militarily. A better way of spotting enemy tanks makes your weapons more effective, but it isn't itself a weapon.)

Both you and Dagon can point your fingers wherever you like. The more interesting question is where it's useful to point your fingers.

The idea in this post, combined with my generally short timelines, makes me quite bearish on career plans that involve spending several years doing relatively unimportant things for the sake of credentials (e.g. most grad school plans).

But this isn’t quite right, at least not when “AI takeover” is interpreted in the obvious way, as meaning that an AI or group of AIs is firmly in political control of the world, ordering humans about, monopolizing violence, etc. Even if AIs don’t yet have that sort of political control, it may already be too late.

The AI's will probably never be in a position of political control. I suspect the AI would bootstrap self-replicating (nano?) tech. It might find a way to totally brainwash people, and spread it across the internet. The end game is always going to be covering the planet in self replicating nanotech, or similar.  Politics does not seem that helpful towards such goal. Politics is generally slow.

I think this depends on how fast the takeoff is. If crossing the human range, and recursive self-improvement, take months or years rather than days, there may be an intermediate period where political control is used to get more resources and security. Politics can happen on a timespan of weeks or months. Brainwashing people is a special case of politics. Yeah I agree the endgame is always nanobot swarms etc.

That makes sense and I think it's important that this point gets made. I'm particularly interested by the political movement that you refer to. Could you explain this concept in more detail? Is there anything like such a political movement already being built at the moment? If not, how would you see this starting?

I don't consider this my area of expertise; I think it's very easy to do more harm than good by starting political movements. However, it seems likely to me that in order for the future to go well various governments and corporations will need to become convinced that AI risk is real, and maybe an awareness-raising campaign is the best way to do this. That's what I had in mind. In some sense that's what many people have been doing already, e.g. by writing books like Superintelligence. However, maybe eventually we'd need to get more political, e.g. by organizing a protest or something. Idk. Like I said, this could easily backfire.

I agree and I think books such as Superintelligence have definitely decreased the x-risk chance. I think 'convincing governments and corporations that this is a real risk' would be a great step forward. What I haven't seen anywhere, is a coherent list of options how to achieve that, preferably ranked by impact. A protest might be up there, but probably there are better ways. I think making that list would be a great first step. Can't we do that here somewhere?

I think there are various people working on it, the AI policy people at Future of Humanity Institute for example, maybe people at CSET. I recommend you read their stuff and maybe try to talk to them.

I know their work and I'm pretty sure there's no list on how to convince governments and corporations that AI risk is an actual thing.. PhDs are not the kind of people inclined to take any concrete action I think.

I disagree. I would be surprised if they haven't brainstormed such a list at least once. And just because you don't see them doing any concrete action doesn't mean they aren't--they just might not be doing anything super public yet.

Don't get me wrong, I think institutes like FHI are doing very useful research. I think there should be a lot more of them, at many different universities. I just think what's missing in the whole X-risk scene is a way to take things out of this still fairly marginal scene and into the mainstream. As long as the mainstream is not convinced that this is an actual problem, efforts are always enormously going to lag mainstream AI efforts, with predictable results.

Maybe. But I actually currently think that the longer these issues stay out of the mainstream, the better. Mainstream political discourse is so corrupted; when something becomes politicized, that means it's harder for anything to be done about it and a LOT harder for the truth to win out. You don't see nuanced, balancing-risks-and-benefits solutions come out of politicized debates. Instead you see two one-sided, extreme agendas bashing on each other and then occasionally one of them wins.

(That said, now that I put it that way, maybe that's what we want for AI risk--but only if we get to dictate the content of one of the extreme agendas and only if we are likely to win. Those are two very big ifs.)

It's funny, I heard that opinion a number of times before, mostly from Americans. Maybe it has to do with your bipartisan flavor of democracy. I think Americans are also much more skeptical of states in general. You tend to look to companies for solving problems, Europeans tend to look to states (generalized). In The Netherlands we have a host of parties, and although there are still a lot of pointless debates, I wouldn't say it's nearly as bad as what you describe. I can't imagine e.g. climate change solved without state intervention (the situation here is now that the left is calling for renewables, the right for nuclear - not so bad).

For AI Safety, even with a bipartisan debate, the situation now is that both parties implicitly think AI Safety is not an issue (probably because they have never heard of it, or at least not given it serious thought). After politicization, worst case at least one of the parties will think it's a serious issue. That would mean that roughly 50% of the time, if party #1 wins, we get a fair chance of meaningful intervention such as appropriate funding, hopefully helpful regulation efforts (that's our responsibility too - we can put good regulation proposals out there), and even cooperation with other countries. If party #2 wins, there will perhaps be zero effort or some withdrawal. I would say this 50% solution easily beats the 0% solution we have now. In a multi-party system such as we have, the outcome could even be better.

I think we should prioritize getting the issue out there. The way I see it, it's the only hope for state intervention, which is badly needed.

Perhaps American politics is indeed less rational than European politics, I wouldn't know. But American politics is more important for influencing AI since the big AI companies are American.

Besides, if you want to get governments involved, raising public awareness is only one way to do that, and not the best way IMO. I think it's much more effective to do wonkery / think tankery / lobbying / etc. Public movements are only necessary when you have massive organized opposition that needs to be overcome by sheer weight of public opinion. When you don't have massive organized opposition, and heads are still cool, and there's still a chance of just straightforwardly convincing people of the merits of your case... best not to risk ruining that lucky situation!

I have kind of a strong opinion in favor of policy intervention because I don't think it's optional. I think it's necessary. My main argument is as follows:

I think we have two options to reduce AI extinction risk:

1) Fixing it technically and ethically (I'll call the combination of both working out the 'tech fix'). Don't delay.

2) Delay until we can work out 1. After the delay, AGI may or may not still be carried out, depending mainly on the outcome of 1.

If option 1 does not work, of which there is a reasonable chance (it hasn't worked so far and we're not necessarily close to a safe solution), I think option 2 is our only chance to reduce the AI X-risk to acceptable levels. However, AI academics and corporations are both strongly opposed to option 2. It would therefore take a force at least as powerful as those two groups combined to still pursue this option. The only option I can think of is a popular movement. Lobbying and think tanking may help, but corporations will be better funded and therefore the public interest is not likely to prevail. Wonkery could be promising as well. I'm happy to be convinced of more alternative options.

If the tech fix works, I'm all for it. But currently, I think the risks are way too big and it may not work at all. Therefore I think it makes sense to apply the precautionary principle here and start with policy interventions, until it can be demonstrated that X-risk for AGI has fallen to an acceptable level. As a nice side effect, this should dramatically increase AI Safety funding, since suddenly corporate incentives are to fund this first in order to reach allowed AGI.

I'm aware that this is a strong minority opinion on LW, since:

1) Many people here have affinity with futurism which would love an AGI revolution

2) Many people have backgrounds in AI academia, and/or AI corporations, which both have incentives to continue working on AGI

3) It could be wrong of course. :) I'm open for arguments which would change the above line of thinking.

So I'm not expecting a host of upvotes, but as rationalists, I'm sure you appreciate the value of dissent as a way to move towards a careful and balanced opinion. I do at least. :)

Want to have a video chat about this? I'd love to. :)

Well sure, why not. I'll send you a PM.

I wouldn't say less rational, but more bipartisan, yes. But you're right I guess that European politics is less important in this case. Also don't forget Chinese politics, which has entirely different dynamics of course.

I think you have a good point as well that wonkery, think tankery, and lobbying are also promising options. I think they, and starting a movement, should be on a little list of policy intervention options. I think each will have its own merits and issues. But still, we should have a group of people actually starting to work on this, whatever the optimal path turns out to be.