The term Pivotal Act was written up on Arbital in 2015. I only started hearing it discussed in 2020, and then it rapidly started seeing more traction when the MIRI 2021 Conversations were released.

I think people mostly learned the word via context-clues, and never actually read the article. If you're going to have a serious conversation of pivotal acts, please read the actual article in full.

Pivotal act is defined:

The term 'pivotal act' in the context of AI alignment theory is a guarded term to refer to actions that will make a large positive difference a billion years later.

And then, almost immediately afterwards, the article goes on to reiterate "this is a guarded term", and explains why. i.e. this is a jargon term that people are going to be very tempted to stretch the definition of, but which it's really important not to stretch the definition of.

The article notes:

Reason for guardedness

Guarded definitions are deployed where there is reason to suspect that a concept will otherwise be over-extended. The case for having a guarded definition of 'pivotal act' (and another for 'existential catastrophe') is that, after it's been shown that event X is maybe not as important as originally thought, one side of that debate may be strongly tempted to go on arguing that, wait, really it could be "relevant" (by some strained line of possibility).

It includes a bunch of examples (you really should go read the full article), and then notes:

Discussion: Many strained arguments for X being a pivotal act have a step where X is an input into a large pool of goodness that also has many other inputs. A ZF provability oracle would advance mathematics, and mathematics can be useful for alignment research, but there's nothing obviously game-changing about a ZF oracle that's specialized for advancing alignment work, and it's unlikely that the effect on win probabilities would be large relative to the many other inputs into total mathematical progress.

Similarly, handling trucker disemployment would only be one factor among many in world economic growth.

By contrast, a genie that uploaded human researchers putatively would not be producing merely one upload among many; it would be producing the only uploads where the default was otherwise no uploads. In turn, these uploads could do decades or centuries of unrushed serial research on the AI alignment problem, where the alternative was rushed research over much shorter timespans; and this can plausibly make the difference by itself between an AI that achieves ~100% of value versus an AI that achieves ~0% of value. At the end of the extrapolation where we ask what difference everything is supposed to make, we find a series of direct impacts producing events qualitatively different from the default, ending in a huge percentage difference in how much of all possible value gets achieved.

By having narrow and guarded definitions of 'pivotal acts' and 'existential catastrophes', we can avoid bait-and-switch arguments for the importance of research proposals, where the 'bait' is raising the apparent importance of 'AI safety' by discussing things with large direct impacts on astronomical stakes (like a paperclip maximizer or Friendly sovereign) and the 'switch' is to working on problems of dubious astronomical impact that are inputs into large pools with many other inputs.

I see people stretching "pivotal act" to mean "things that delay AGI for a few years or decades", which isn't what the term is meant to mean. 

Full article here.

113

23 comments, sorted by Click to highlight new comments since: Today at 4:38 AM
New Comment

I suspect that thinking about the AI x-risk would benefit from stopping using the term "pivotal act" even more than from using it as defined.

1.  It introduces an artificial and confusing discontinuity in the space of actions 
2.  It nudges people to come up with heroic actions. Heroic changes are mostly not the way you improve safety of complex systems

My impression is it's mostly a wrong-way reduction.  (per Chapman: "when you have a problem that is nebulous—complicated, messy, and ambiguous. A wrong-way reduction claims to replace that with a simple, tidy, clear-cut problem. What’s wrong is that the new problem is harder than your original one ...")

Thinking about how to reduce AI related x-risk fits exactly - it's complicated, nebulous and ambigous
Pivotal act helps with tidy definition. It's appealing: instead of dealing with the ambiguity and staring into the whirling ocean of complexity, you can brainstorm specific stories ("nanobots which will destroy GPUs"), you have something clear, and crisp.

The problem is in my estimate the "reduced problem" is actually harder.  For example, here is challenge: name pivotal acts (or maybe "pivotal events") which happened in past, let's say, 3 billion years - if any such acts happened? (If you want to actually do this and write it down, I would suggest blacking out your reply.)

 

I think the argument for thinking in terms of pivotal acts is quite strong. AGI destroys the world by default; a variety of deliberate actions can stop this, but there isn't a smooth and continuous process that rolls us from 2022 to an awesome future, with no major actions or events deliberately occurring to set us on that trajectory.

By contrast, the counter-arguments here seem weak to me, and overly insensitive to the specifics of the actual situation. "Complex situations don't get resolved via phase transitions" and "heroism never makes a big difference in real life" are extremely general objections; if there was a particular situation humanity found itself in that was the exception to this rule, the heuristic would provide no guidance that this is an exception. You just die if you over-rely on the heuristic.

"There hasn't been a past pivotal act" likewise seems like a very weak argument to me. AGI is a world-historical novelty, with its own causal dynamics. There's some weak similarity to the advent of human intelligence, but mostly, it's just a new event. There is no natural force pushing AGI toward being low-impact, just because non-AGI processes were low-impact. There is no natural force pushing for AGI to have better outcomes if no one ever does anything about it (no "heroes"), just because heroes are rare historically. The causal dynamics of AGI, and of the world in relation to AGI, are a product of the specifics of the situation (e.g., facts about CS and about how engineers work on ML), not a prophecy or echo of causally dissimilar past events.

At the level of abstraction "complex event", sure, complicated stuff is often continuous in various ways. But switching topics to "complex event" means fuzzing out all the details about the actual event we're talking about. It's throwing away nearly all information we have, and hoping that this one bit ("complex: y or n?") carries the day. I think fuzzing out details can be a neat exercise for doing Original Seeing at the problem, but I wouldn't put my weight down on that style of reasoning.

The problem is in my estimate the "reduced problem" is actually harder.

I don't think it's harder; i.e., I don't think a significant fraction of our hope rests on long-term processes that trend in good directions but include no important positive "events" or phase changes. (And, more strongly, I think humanity will in fact die if we don't leverage some novel tech to prevent the proliferation of AGI systems.)

I think the hardness is just easier to see and easier to emotionally appreciate, exactly because you're getting concrete about sequences of events in the world.

Rushing to premature concreteness is indeed an error. But I think EA thus far has mostly made the opposite error, refusing to go concrete and thereby avoiding the pressure and constraint of having to actually plan, face tradeoffs, entertain unpleasant realities, etc. If you stay in vagueness indefinitely, things may feel more optimistic, but I flatly doubt that this vague feeling, with little associated scenario analysis or chains of reasoning, is grounded in reality.

[it was easier to draw some things vs. write them]

AGI destroys the world by default; a variety of deliberate actions can stop this, but there isn't a smooth and continuous process that rolls us from 2022 to an awesome future, with no major actions or events deliberately occurring to set us on that trajectory.

 This seems to conflate multiple claims. Consider the whole trajectory.



"AGI destroys the world by default" - seem clear, I interpret it is as "if you straightforwardly extrapolate past trajectory, we end in catastrophe"”

It's less clear to me what the rest means.

Option a) "trajectories like in the picture bellow do not exist"

(note the turn is smooth)

This seems very strong a claim to me, and highly implausible. Still, if I understand correctly,  this is what you put most weight on.

Option b) "trajectories like in a) exist, but it won't be our trajectory, without significant deliberate efforts"

This seems plausible, although the word "deliberate" introduces some ambiguity.

One way to think about this is in terms of  "steering forces" and incentive gradients. In my view it is more likely than not that with increasing power of the systems, parts of "alignment" will become more of a convergent goal for developers (e.g. because aligned systems get better performance, or alignment tools and theory helps you with designing more competitive systems). I'm not sure if you would count that as "deliberate" (my guess: no). (Just to be sure: this isn't to claim that this pull is sufficient for safety.)

In my view the the steering forces can become sufficiently strong without any legible "major event". In particular without any event legible as important when it is happening. (As far as I understand, you would strongly disagree)

In contrast, pivotal act would look more like this:


 

I don't think this is necessary or even common feature of winning trajectories.
 

"Complex situations don't get resolved via phase transitions" and "heroism never makes a big difference in real life" are extremely general objections.

Sorry but this reads like a strawman of my position. "Heroic changes are mostly not the way you improve safety of complex systems." is a very different claim to "heroism never makes a big difference in real life".

To convey the intuition, consider the case of a nuclear power plant. How do you make something like that safe? Basically, not by one strong intervention on one link in a causal graph, but by intervening at a large fraction of the causal graph, and by adding layered defense, preventing failures from propagating.  

Heroic acts obviously can make a big difference. In the case of the nuclear power plant, some scenarios could be saved by a team of heroic firefighters who will provide emergency cooling. Or, clearly, a Chernobyl disaster would have been prevented if a SWAT team landed in the control room, shot everyone, and stopped the plant in a safe way.

My claim isn't that this never works. The only claim is that the majority of bits of safety originates from a different types of intervention (And I do think this is also true for AI safety.)
 

There is no natural force… 


As is probably clear, I like the forces framing. Note that it feels quite different from the "pivotal acts" framing. 

I don't care that much whether the forces are natural or not, but whether they exist. Actually I do think one of the more useful things to do about AI safety is 
- think about directions in which you want movement
- think about "types" of forces which may pull in that direction (where "type" could be e.g. profit incentives from market, cultural incentives, or instrumental technological usefulness)
-think about what sort of a system is able to exert such force (where the type could be e.g. individual engineer,  a culture-based superagent, or even useful math theory)
- this 3d space gives you a lot of combinations. Compare, choose and execute
 

At the level of abstraction "complex event", sure, complicated stuff is often continuous in various ways. ...


This isn't what I mean. I don't advocate for people to throw out all the details. I mostly advocate for people to project the very high-dimensional real world situation into low-dimensional representations which are continuous, as opposed to categorical. 

Moreover, you (and Eliezer, and others) have a strong tendency to discretize the projections in an iterative way. Let's say you start with "pivotal acts". In the next step, you discretize the "power of system" dimension: "strong systems" are capable of pivotal acts, "weak systems" are not. In the next step, you use this to discretize a bunch of other dimensions - e.g. weak interpretability tools help with weak systems, but not with strong systems. And so on. The endpoint are just a few actually continuous dimensions, and a longer list of discrete labels. 

To be clear: I'm very much in favour of someone trying this.(I expect this to fail, at least for now.)

But I'm also very much in favour of many people trying to not do this, and focusing more on trying different projection. Or looking for steepest local gradient descend updates from the point where we are now.
 

But I think EA thus far has mostly made the opposite error, refusing to go concrete and thereby avoiding the pressure and constraint of having to actually plan, face tradeoffs, entertain unpleasant realities, etc. (...)

Sorry but I'm confused how the EA label landed here and I'm a bit worried it has some properties of a red herring. I don't know if the "you" is directed at me, "EA" (whatever it is), or readers of our conversation

I think the diagram could be better drawn with at least one axis with a scale like "potential AI cognitive capability".

At the bottom, in the big white zone, everything is safe and nothing is amazing.

Further up the page, some big faint green "applications of AI" patches appear in which things start to be nicer in some ways. There are also some big faint red patches, many of which overlap the green, where misapplication of AI makes things worse in some ways.

As you go up the page, both the red and green regions intensify, and some of the deeper green regions dead-end into black representing paths that can no longer be averted from extinction or other uncorrectable bad futures. Some big patches of black start to appear straight in front of white or pale green, representing humanity holding off from implementing AGI until they thought alignment was solved, but it went wrong before any benefits could appear.

By the time you reach the top of the page, it is almost all black. There are a few tiny spots of intense green, connected only by thin, zig-zag threads that are mostly white to lower parts of the page. Even at the top of the page, we don't know which of those brilliant green points might actually lead to dead-ends into black further up.

That's roughly how I see the alignment landscape: that steering to those brilliant green specks will mostly require avoiding implementing AGI.

I actually don't necessarily disagree with this. (I'm generally pretty confused about how to think about pivotal acts, and AI strategy generally)

But, insofar as one doesn't think they should use the pivotal-act frame, I think the solution is to just not use it, rather than water-down the word. 

(I think an important thing that the pivotal act frame is getting at is that somehow you actually need to exist the Acute Risk Period. There are a lot of vague plans that sounds sorta helpful but don't actually add up to "we have left the acute risk period", and many of those vague plans won't work even if you stack them all up together. I think it is plausible you don't need the all-or-nothing implication of the Pivotal frame, but there is something important about plans that could possibly work, or be part of a constellation of plans that could possibly-work-together.)

To be clear 
- I don't disagree with the original post - just wanted to suggest not using the term as an option
- I do agree there is value in asking, in my paraphrase, "what's the implied safe end here"
- I mostly don't agree with assumption behind the term that many small changes generally don't add up
 

Going forward, when setting up new language, it might be beneficial to consider choosing terminology that doesn't strongly evoke the feeling that its frequent use in discussion could threaten the integrity of its definition in the first place. Now, the deed is already done here, but I would suggest for the future making sure that terms needing guarded definitions are more descriptive. At a first pass, the procedure might look like:

  • Using an obviously and explicitly temporary word during discussions (e.g. foo, thinggummi), settle on a firm and final definition for the concept being discussed
  • Distill a word or phrase from the definition itself that is unlikely to be diluted by by being introduced into common parlance. The goal here is to point specifically at the definition in a way that is difficult to undo if/when the word enters circulation. Maybe even choose a word that seems likely to go largely unused outside a very narrow range of discussions, if that seems appropriate.
  • Still write a statement of intention for the term and guard its definition closely in future conversations. Include a copy-paste footnote that anybody can use to give a summary definition with a pointer to the full statement and its associated discussion.

Since "pivotal act" is defined as "actions that will make a large positive difference a billion years later", you might end up with something like "gigayear benefit" (off the top of my head). With frequent use, this might even collapse into a form that's basically gibberish to those without a passing understanding of the topic (e.g. "gigyben"). You end up with something that's less punchy than "pivotal act", but that's kind of the point. Producing a word or phrase that is unlikely to be used in other contexts helps protect it from definitional drift.

To be clear, I am not suggesting that the phrase "pivotal act" be changed at this point ("gigyben" is terrible anyway -- sounds like a children's superhero). Rather, I'm agreeing that the words we choose to use are important and can be worth protecting. (Literally get off of the lost causes, though. Language does change over time and there's little you can do to meaningfully resist that once it's started.) I'm suggesting a layer of security for some jargon in particularly sensitive topics. If "pivotal act" didn't seem like a particularly corruptible phrase (it sounds like something a Literature professor or journalist might make up, and is just screaming out be used hyperbolically), we probably wouldn't be having this discussion in the first place.

I think I basically agree with this. I definitely generally think of it as the jargonist's job to come up with jargon that has a decent shot at weathering the forces of conversational pressure, and if you want an oddly specific term it's better to name it something that sounds oddly specific. (This still doesn't reliably work, people will shoehorn the oddly-specific jargon into things to sound smart, but it makes it less plausibly deniable)

I like "gigayear impact" or something similar.

I do think it's still helpful to have the concept of guarded terms.

I wish we had somehow adopted the practice of using lots of acronyms; I think they probably work much better at 1) preserving technical meanings and 2) not overloading words that people think they already understand. (You weren't using "CIA" for anything before you heard about the Central Intelligence Agency; you probably were using "pivotal" and "act" for something before you heard about pivotal acts.) Like, I think it's relatively unlikely that SEAI will drift in common usage, in part because it seems hard to mistakenly believe that you understand the common meaning of SEAI unless you know what the acronym stands for.

Man I wish I had a strong disagree react.

I disagree on acronyms being a good solution here – they initially are a lot of free real-estate, but quickly get cluttered up, and are hard to distinguish. I already see this with organizations with very similar acronyms that I have trouble telling apart. 

This is doomed.  Jargon appears and evolves, and is always context-specific in ways that conflict with other uses.  

The solution is not to pick less-common sound sequences, that just makes it hard to discuss in technical terms.  The solution is to use more words when the context calls for it (like when first using the phrase in a post, note that you mean this technical definition, not the more common layman's interpretation).

I haven't actually paid attention to this post, so I don't know if the complaint is that someone used it in the wrong context in a confusing way or if they're somehow expecting people to always have the right context.  The answer should be "use more words", not "it always means what I want it to mean".

To restate my argument simply: the more closely a term captures its intended definition, the less work the community will need to do to guard the intended definition of that term. The less interesting a term sounds, the less likely it is to be co-opted for some other purpose. This should be acted on intentionally and documented publicly by those wishing to protect a term. People bringing the term into the conversation should be prepared to point at that documentation.

I see people stretching "pivotal act" to mean "things that delay AGI for a few years or decades", which isn't what the term is meant to mean. 

Well, that can be a pivotal act if you pair "thing that delays AGI for a few years or decades" with some mechanism for leveraging a few years or decades into awesome long-term outcomes. E.g., if you have a second plan that produces an existential win iff humanity survives for 6 years post-AGI, then 6 years suffices.

But you do actually need that second component! Or you need a plan that lets you delay as long as needed (e.g., long enough to solve the full alignment problem for sovereign Friendly AI). The EY pivotal acts I've seen fall in the "this lets you delay as long as needed" bucket.

(Which I like because "we'll be able to do very novel thing X within n years" feels to me like the kind of assumption that reality often violates....)

Yeah, the people saying this are definitely not doing the "pair with actual solution", and when I've previously brought this up in person with them they kinda had an "oh...." reaction, like it was a real update to them that this was also required.

Came here to say this.

Specifically, the pivotal act I most often default to is something like:

  1. Delay AGI by 10 years
  2. Use those ten years to solve the alignment problem

Since I expect that the Alignment problem will be a lot easier once we know what AGI looks like.  But not so easy that you can solve it in in the ~6 month lead-time that OpenAI currently has over the rest of the world.

I feel generally agreeable towards this concept, and also towards the idea of being careful to use phrases as they are defined.

But I feel something else after starting to read the Arbital page. Since you quadruple insisted on it, I went ahead and actually opened the page and started reading it. And several things felt off in quick succession. I'm going to think out loud through those things here.

The first part is the concept of "guarded term". Here's part of the definition of that.

stretching it ... is an unusually strong discourtesy.

...You can't just say that something is a discourtesy. I have never heard of "guarded term" and I'm pretty sure it's a thing that the people writing these pages made up, and is not well-known basically anywhere. So it's pretty weird to say "if you do thing X, you're being discourteous". The way rudeness works is complicated, but it doesn't work this way. You need bigger social agreement before something is actually rude.

Synonyms include 'pivotal achievement' and 'astronomical achievement'.

It feels pretty weird an unnecessarily confusing to tell the reader about two synonyms right away, especially when I'm pretty sure that all of these terms are obscure. It seems like it would have been a lot better to just declare the title of the page to be the one term for this, and to let any "synonyms" fade into non-use.

The next paragraph defines two other terms, a contrasting term, and a superset term, each with their own abbreviations.

Then the next paragraph tells me about two deprecated terms!

Why on earth are you dumping all these random, extremely similar but different, not-at-all widely used terms on me? You're both making it weirdly difficult for me to come away using terms you want me to use, and also making it seem like there's a whole big history of using these terms when there really isn't.

Next bit:

but AI alignment researchers kept running into the problem

...

Usage has therefore shifted such that (as of late 2021) researchers use...

Okay yeah, this is getting super annoying. Who is speaking for all "AI alignment researchers"? I'm like 95% sure this is all just referring to like half a dozen people having a series of conversations in the MIRI office. But it seems to be making it sound like a whole extant field, as if me using these terms wrong will cause miscommunication with "AI alignment researchers" --

 

...oooh. This is the feeling of detecting Frame Control. Yeah, that feels clarifying. I am getting increasingly weirded out by this page in part because it seems to be trying to control the frame.

To be clear, I don't think this is intentional, or that any bad intent was necessarily being executed. And for all I know, maybe the About page of Arbital says something like "here I will write articles as if terms were in established use in my preferred way." Maybe the whole thing was semi-aspirational/semi-fictional. But I'm not going to go looking for more explanation. My heuristic for dealing with frame control is to leave. You get a certain number of chances to say your thing and make me understand what you're trying to say, and after a certain number of frame-control-detection strikes, I just leave.

So, I'm not going to finish reading the Arbital page on Pivotal Act even though Raemon quadruple recommended it. And I guess I'll just go ahead using "pivotal act" the same way I hear other people using it, maybe while vaguely remembering the one-sentence definition I did get, and continuing to independently evaluate the validity of the concept.

Okay, but how do we get technical terms with precise meanings that are analyzable using propositions that can be investigated and decided using logic and observation? If we're in a context where the meaning of words is automatically eroded by projection into low-dimensional, low-context concepts into whatever the surrounding political forces want, we're not going to get anywhere without being able to fix the meaning of words we need to have a non-obvious technically important use.

Instead of saying "using this term to mean X is a discourtesy", one could try "please don't use this term to mean X, and please encourage your readers not to use it to mean X, and to encourage their readers and so on".

FWIW, I think this is an oversensitive frame-control reaction. Like, I agree there is (some) frame control* going on here, and there have been some other Eliezer-pieces that felt more-frame-control-y enough that I think it's reasonable to be watching out for. 

But it seems like you tapped out here at the slightest hint of it, and meanwhile... this term only exists at all because Eliezer thought it was an important concept to crystallize, and it's only in the public discourse right now because Eliezer started talking about it, and refusing to understand what he actually means when he says it just seems super weird to me. 

It was written on arbital which was always kinda in a weird beta state. Having read a fair amount of arbital posts, my sense is Eliezer was sort of privately writing the textbook/background reading that he thought was important for the AI Alignment community he wanted to build. Eliezer didn't crosspost it to LW as if it were written/ready for the LW audience, I did, so judging it on those terms feels weird.

(* note: I think frame control is moderately common, isn't automatically bad, I think it might be a good rationalist-norm to acknowledge when you're doing it but that norm isn't at all established and definitely wasn't established in 2015 when this was first written.)

New to LessWrong?