Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The solution comes in the next post! Feel free to discuss amongst yourselves.

Reminder: Your sentence should explain impact from all of the perspectives we discussed (from XYZ to humans).

New Comment
28 comments, sorted by Click to highlight new comments since:

What I came up with before reading the spoilers or the next posts in the sequence:

A big deal is any event that significantly changes my expected ability to accomplish my goals (whether by having an impact specific to me, or an objective impact).

Late to the party, but here's my crack at it (ROT13'd since markdown spoilers made it an empty box without my text):

Fbzrguvat srryf yvxr n ovt qrny vs V cerqvpg gung vg unf n (ovt) vzcnpg ba gur cbffvovyvgl bs zl tbnyf/inyhrf/bowrpgvirf orvat ernyvmrq. Nffhzvat sbe n zbzrag gung gur tbnyf/inyhrf ner jryy-pncgherq ol n hgvyvgl shapgvba, vzcnpg jbhyq or fbzrguvat yvxr rkcrpgrq hgvyvgl nsgre gur vzcnpgshy rirag - rkcrpgrq hgvyvgl orsber gur rirag. Boivbhfyl, nf lbh'ir cbvagrq bhg, fbzrguvat orvat vzcnpgshy nppbeqvat gb guvf abgvba qrcraqf obgu ba gur inyhrf naq ba ubj "bowrpgviryl vzcnpgshy" vg vf (v.r. ubj qenfgvpnyyl vg punatrf gur frg bs cbffvoyr shgherf).

I set a fifteen minute timer, and wrote down my thoughts:

Okay, the main thought I have so far is that the examples mostly seem to separate “Affects personal goals” from “Affects convergent instrumental goals”.

1. The pebbles being changed mostly affects the pebble sorters personal goals, and otherwise has little impact in the scheme of things for how able everyone is to get their goals achieved. In the long term, it doesn’t even affect the pebblesorters’ ability to achieve their goals (there basically is a constant amount of resources in the universe to turn into pebbles and sort, and the amount on their planet is miniscule).

2. The badness of the traffic jam is actually mostly determined by it being bad for most agents’ goals in that situation (constrained to travel by cars and such). I might personally care more if I was in a medical emergency or something, but this wouldn’t be true from the perspective of convergent instrumental goals.

3. Asteroid hitting earth damages any agents’ ability to affect the world. We care more because we’re on the planet, but overall it is determined by convergent instrumental goals.

4. Star exploding is the same as asteroid on earth.

5. Not sure the point of the last one regarding epistemic state.

I have a sense that TurnTrout is suggesting we build an AI that will optimise your personal goals, but while attempting to make no changes on the level of convergent instrumental goals - kill no agents, change no balances of power, move no money, etc.

However, this runs into the standard problem of being useless. For example, you could tell an agent to cure cancer, and then it would cure all the people who have cancer. But then, in order to make sure it changes no balance of power or agent lives or anything, it would make sure to kill all the people who would’ve died, and make sure that other humans do not find out the AI has a cure for cancer. This is so that they don’t start acting very differently (e.g. turning off the AI and then taking the cancer cure, which *would* disturb the balance of power).

Hmm. I do think it would be somewhat useful. Like, this AI would be happy to help you get your goal as long as it didn’t change your power. For example, if you really wanted a good tv show to watch, it would happily create it for you, because it doesn’t change your or other agents’ abilities to affect the universe. (Though I think there are arguments that tv shows do affect humans that way, because the interaction between human values and motivations are a weird mess.) But, for example, I think the pebble sorters could build an AI that is happy to produce things they find valuable, as long as it doesn’t upset broader power balances. Which is potentially quite neat, and if this works out, a valuable insight.

But it doesn’t do the most important things, which are ensure that the future of the universe is also valuable. Because that’s necessarily a power thing - privileging a certain set of values over convergent values. And if you tried to combine such power-averse AIs to do a powerful thing, then they would stop doing it, because they’re superintelligent and would understand that they would be upsetting the power balance.

Okay, that’s my fifteen minutes. I don’t think I got what TurnTrout was trying to guide me to getting, even though I think he previously told me in person and I've read the next post. (Though maybe I did get it and just didn't manage to put it in a pithy sentence. )

Extra: I forgot to analyse the humans being tortured, so I'll add it here: Again, not a big deal from the perspective of any agent, though is low in our personal utility function. I think that a power-averse AI would be happy to find a plan to stop the humans being tortured, as long as it didn't severely incapacitate whatever agents were doing the torturing.

Great responses.

What you're inferring is impressively close to where the sequence is leading in some ways, but the final destination is more indirect and avoids the issues you rightly point out (with the exception of the "ensuring the future is valuable" issue; I really don't think we can or should build eg low-impact yet ambitious singletons - more on that later).

My answer to this was

Something is a big deal iff the amount of personal value I expect in the world where the thing happen vs. the world where it doesn't happen is large.

I stopped the timer after five minutes because the answer just seemed to work.


Thought as I worked through the exercise:

  • Is there something I'm missing? It seems like TurnTrout's already given us all the pieces. Seems like we can say that "Something has high impact to someone if it either affects something they value (the personal side) or affects their ability to do things more broadly (the objective side)."
  • Something is a big deal if it affects our ability to take future actions? (That seems to be the deal about objectively being bad.)
  • Is the point here to unify it into one sort of coherent notion?
  • Okay, so let's back up for a second and try to do all of this from scratch...When I think about what "impact" feels like to me, I imagine something big, like the world exploding.
    • But it doesn't necessarily have to be a big change. A world where everyone has one less finger doesn't seem to be a big change, but it seems to be high impact. Or a world where the button that launches nukes is pressed rather than not pressed. Maybe we need to look some more into the future? (Do we need discounting? Maybe if nukes get launched in the far future, it's not that bad?)
  • I think it's important to think relative to the agent in question, in order to think about impact. You also want to look at what changed. Small changes aren't necessarily low impact, but I think large changes will correspond to high impact.
    • It seems like "A change has has high impact if the agent's valuation of the after state is very different than their valuation of the current state" is the best I have after 15 minutes...

Starting assumptions: impact is measured on a per-belief basis, depends on scale, and is a relative measurement to prior expectation. (This is how I am interpreting the three reminders at the end of the post.)

To me, this sounds like a percent difference. The change between the new value observed and the old value expected (whether based in actual experience or imagined, i.e. accounting for some personal bias) is measured, then divided by the original quantity as a comparison to determine the magnitude of the difference relative to the original expectation.

My sentence: You can tell that something is a big deal to you by how surprising it feels.

While I agree that using percentages would make impact more comparable between agents and timesteps, it also leads to counterintuitive results (at least counterintuitive to me)

Consider a sequence of utilities at times 0, 1, 2 with , and .

Now the drop from to would be more dramatic (decrease by 100%) compared to the drop from to (decrease by 99%) if we were using percentages. But I think the agent should 'care more' about the larger drop in absolute utility (i.e. spend more resources to prevent it from happening) and I suppose we might want to let impact correspond to something like 'how much we care about this event happening'.

That would depend on whether things have a multiplicative effect on utility, or additive.


Vzcnpg vf gur nzbhag V zhfg qb guvatf qvssreragyl gb ernpu zl tbnyf
Ngyrnfg guerr ovt fgebat vaghvgvbaf. N guvat gung unccraf vs vg gheaf gur erfhygf bs zl pheerag npgvbaf gb or jnl jbefr vf ovt vzcnpg. N guvat gung unccraf vs gur srnfvovyvgl be hgvyvgl bs npgvba ng zl qvfcbfny vf punatrq n ybg gura gung vf n ovt qrny (juvpu bsgra zrnaf gung npgvba zhfg or cresbezrq be zhfg abg or cresbezrq). Vs gurer vf n ybg bs fhecevfr ohg gur jnl gb birepbzr gur fhecevfrf vf gb pneel ba rknpgyl nf V jnf nyernql qbvat vf ybj gb ab vzcnpg.

For ease of reference, I'm going to translate any ROT13 comments into normal spoilers.

Impact is the amount I must do things differently to reach my goals Atleast three big strong intuitions. A thing that happens if it turns the results of my current actions to be way worse is big impact. A thing that happens if the feasibility or utility of action at my disposal is changed a lot then that is a big deal (which often means that action must be performed or must not be performed). If there is a lot of surprise but the way to overcome the surprises is to carry on exactly as I was already doing is low to no impact.


Gur vzcnpg bs na rirag ba lbh vf gur qvssrerapr orgjrra gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba tvira pregnvagl gung gur rirag jvyy unccra, naq gur pheerag rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba.

Zber sbeznyyl, jr fnl gung gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba vf gur fhz, bire nyy cbffvoyr jbeyqfgngrf K, bs C(K)*H(K), juvyr gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba tvira pregnvagl gung n fgngrzrag R nobhg gur jbeyq vf gehr vf gur fhz bire nyy cbffvoyr jbeyqfgngrf K bs C(K|R)*H(K). Gur vzcnpg bs R orvat gehr, gura, vf gur nofbyhgr inyhr bs gur qvssrerapr bs gubfr gjb dhnagvgvrf.

Translation to normal spoiler text:

The impact of an event on you is the difference between the expected value of your utility function given certainty that the event will happen, and the current expected value of your utility function.

More formally, we say that the expected value of your utility function is the sum, over all possible worldstates , of , while the expected value of your utility function given certainty that a statement about the world is true is the sum over all possible worldstates of . The impact of being true, then, is the absolute value of the difference of those two quantities.

Nitpick: one should update based on observations, as opposed to "X has occurred with certainty".

We're talking about the impact of an event though. The very question is only asking about worlds where the event actually happens.

If I don't know whether an event is going to happen and I want to know the impact it will have on me, I compare futures where the event happens to my current idea of the future, based on observation(which also includes some probability mass for the event in question, but not certainty).

In summary, I'm not updating to "X happened with certainty" rather I am estimating the utility in that counterfactual case.

I'll take a crack at this.

To a first order approximation, something is a "big deal" to an agent if it causes a "large" swing in its expected utility.

Rot13'd because I might have misformatted

V guvax V zvtug or fcbvyrerq sebz ernqvat gur bevtvany cncre, ohg zl thrff vf "Gur vzcnpg gb fbzrbar bs na rirag vf ubj zhpu vg punatrf gurve novyvgl gb trg jung jr jnag". Uhznaf pner nobhg Vaveba rkvfgvat orpnhfr vg znxrf vg uneqre gb erqhpr fhssrevat naq vapernfr unccvarff. (Abg fher ubj gb fdhner guvf qrsvavgvba bs vzcnpg jvgu svaqvat bhg arj vasb gung jnf nyernql gurer, nf va gur pnfr bs Vaveba, vg unq nyernql rkvfgrq, jr whfg sbhaq bhg nobhg vg.) Crooyvgrf pner nobhg nyy gurve crooyrf orpbzvat bofvqvna orpnhfr vg punatrf gurve novyvgl gb fgnpx crooyrf. Obgu uhznaf naq crooyvgrf pner nobhg orvat uvg ol na nfgrebvq orpnhfr vg'f uneqre gb chefhr bar'f inyhrf vs bar vf xvyyrq ol na nfgrebvq.

Translation to normal spoiler text:

I think I might be spoilered from reading the original paper, but my guess is "The impact to someone of an event is how much it changes their ability to get what we want". Humans care about Iniron existing because it makes it harder to reduce suffering and increase happiness. (Not sure how to square this definition of impact with finding out new info that was already there, as in the case of Iniron, it had already existed, we just found out about it.) Pebblites care about all their pebbles becoming obsidian because it changes their ability to stack pebbles. Both humans and pebblites care about being hit by an asteroid because it's harder to pursue one's values if one is killed by an asteroid.

In draft.js, you have to start a new line. Like a quote. In Markdown, you do spoilers differently.

Check out the FAQ.

Speaking as someone who hit "Edit" on his post over 10 times before checking the FAQ: if you haven't messed with your profile settings about handling comments/posts yet, save yourself some time and just check the FAQ before trying to add spoiler text. The right formatting wasn't as obvious as I expected, although it was simple.

If I'm already somewhat familiar with your work and ideas, do you still recommend these exercises?

Probably, unless you already deeply get the thing the exercise is pointing at. I wrote this sequence in part because my past writing didn't do a great job imparting the important insights. Since I don't a priori know who already does and doesn't get each idea, you might as well follow along (note that usually the exercises are much shorter than this post's).

You die or otherwise become controlled, lose your sovereignty, lose your ability to make your decisions.

 Peeking at the next post, I would also add:

There is a difference between being able to achieve your goals in the outside world, and being able to maintain your individual sovereignty.


"Getting what you want" (in the former category) is different from, say, "being safe" (in the latter category).

Also, some things might feel like "big deals" when they're actually misinterpretations (because maybe it's something you actually couldn't control anyway, or actually shouldn't care about for other reasons)

The spoiler seems to be empty?


So secret that even a spoiler tag wasn't good enough.

I am wondering about the link between the notion of distance (in the first post), extremes in a utility scale, and big deal. That's me in 15'