Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.




Being on Earth when this happens is a big deal, no matter your objectives – you can't hoard pebbles if you're dead! People would feel the loss from anywhere in the cosmos. However, Pebblehoarders wouldn't mind if they weren't in harm's way.

Appendix: Contrived Objectives

A natural definitional objection is that a few agents aren't affected by objectively impactful events. If you think every outcome is equally good, then who cares if the meteor hits?

Obviously, our values aren't like this, and any agent we encounter or build is unlikely to be like this (since these agents wouldn't do much). Furthermore, these agents seem contrived in a technical sense (low measure under reasonable distributions in a reasonable formalization), as we'll see later. That is, "most" agents aren't like this.

From now on, assume we aren't talking about this kind of agent.


Notes

  • Eliezer introduced Pebblesorters in the the Sequences; I made them robots here to better highlight how pointless the pebble transformation is to humans.
  • In informal parts of the sequence, I'll often use "values", "goals", and "objectives" interchangeably, depending on what flows.
  • We're going to lean quite a bit on thought experiments and otherwise speculate on mental processes. While I've taken the obvious step of beta-testing the sequence and randomly peppering my friends with strange questions to check their intuitions, maybe some of the conclusions only hold for people like me. I mean, some people don't have mental imagery – who would've guessed? Even if so, I think we'll be fine; the goal is for an impact measure – deducing human universals would just be a bonus.
  • Objective impact is objective with respect to the agent's values – it is not the case that an objective impact affects you anywhere and anywhen in the universe! If someone finds $100, that matters for agents at that point in space and time (no matter their goals), but it doesn't mean that everyone in the universe is objectively impacted by one person finding some cash!
  • If you think about it, the phenomenon of objective impact is surprising. See, in AI alignment, we're used to no-free-lunch this, no-universal-argument that; the possibility of something objectively important to agents hints that our perspective has been incomplete. It hints that maybe this "impact" thing underlies a key facet of what it means to interact with the world. It hints that even if we saw specific instances of this before, we didn't know we were looking at, and we didn't stop to ask.

New to LessWrong?

New Comment
10 comments, sorted by Click to highlight new comments since: Today at 8:53 PM

As far as I understand, this post decomoses 'impact' into value impact and objective impact. VI is dependent on some agent's ability to reach arbitrary value-driven goals, while OI depends on any agent's ability to reach goals in general.

I'm not sure if there exists a robust distinction between the two - the post doesn't discuss any general demarcation tool.

Maybe I'm wrong, but I think the most important point to note here is that 'objectiveness' of an impact is defined not to be about the 'objective state of the world' - rather about how 'general to all agents' an impact is.

VI is dependent on some agent's ability to reach arbitrary value-driven goals, while OI depends on any agent's ability to reach goals in general.

VI depends on the ability to do one kind of goal in particular, like human values. OI depends on goals in general.

I'm not sure if there exists a robust distinction between the two - the post doesn't discuss any general demarcation tool.

If I understand correctly, this is wondering whether there are some impacts that count for ~50% of all agents, or 10%, or .01% - where do we draw the line? It seems to me that any natural impact (that doesn't involve something crazy like "if the goal encoding starts with '0', shut them off; otherwise, leave them alone") either affects a very low percentage of agents or a very high percentage of agents. So, I'm not going to draw an exact line, but I think it should be intuitively obvious most of the time.

Maybe I'm wrong, but I think the most important point to note here is that 'objectiveness' of an impact is defined not to be about the 'objective state of the world' - rather about how 'general to all agents' an impact is

This is exactly it.

It's interesting to me to consider the case of me getting into a PhD program at UC Berkeley, which felt pretty impactful. It wasn't that I intrinsically valued being a PhD student at Berkeley, and it wasn't just that being a PhD student at Berkeley objectively gave any agent greater ability to achieve their goals (although they pay you, so it's true to some extent), it was that it gave me greater ability to achieve my goals by (a) being able to learn more about AI alignment and (b) getting to hang out with my friends and friends-of-friends in the Bay Area. (a) and (b) weren't automatic consequences of being admitted to the program, I had to do some work to make them happen, and they aren't universally valuable. A simplified example of this kind of thing is somebody giving you a non-transferrable $100 gift voucher for GameStop.

 Objective impact: it is more important that you survive and maintain the ability to make your own decisions and pursue your goals, than it is important that you get specific (subjective) things that you want

Individual sovereignty is more important than preference fulfillment

I have one potential criticism of the examples:

Because I was not sure what was the concrete implication of the asteroid impact, the reveal was unimpactful on me (pun inteded) that it was objectively valued negatively by anybody because they risk death. Had you written that the asteroid strikes near the agent, or that this causes massive catastrophes, then I would probably have though that it mattered the same for local peeblehoarders and for humans. Also, the asteroid might destroy pebbles (or depending on your definition of pebble, make new ones).

Also, I feel that some of your examples of objective impact are indeed relevant to agents in general (not dying/being destroyed), while other depends on sharing a common context (cash, which would be utterly useless in Pebblia if the local economy was based on exchanging peebles for peebles).

Do you just always consider this context as implicit?

Also, I feel that some of your examples of objective impact are indeed relevant to agents in general (not dying/being destroyed), while other depends on sharing a common context (cash, which would be utterly useless in Pebblia if the local economy was based on exchanging peebles for peebles).

Yeah, in the post I wrote

Even if we were on Pebblia, we'd probably think primarily of the impact on human-Pebblehoarder relations.

I don't see the link with my objection, since you quote a part of your post when you write of value impact (which is dependent on the values of the specific agents) and I talk about the need for context even for objective impact (which you present as independent of values and objectives of specific agents)

Oh, I think I see. Yes, this is explicitly talked about later in the sequence - "resources" like cash are given their importance by how they affect future possibilities, and that's highly context-dependent.

(Let me know if this still isn't addressing your objection)

Thanks, I'll keep going then.

It seems to me that objective impact stems from convergent instrumental goals - self-preservation, resource acquisition, etc.