When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️

[-]Jackson Wagner3y94

I would assume it's most impactful to focus on the marginal future where we survive, rather than the median? ie, the futures where humanity barely solves alignment in time, or has a dramatic close-call with AI disaster, or almost fails to build the international agreement needed to suppress certain dangerous technologies, or etc.

IMO, the marginal futures where humanity survives, are the scenarios where our actions have the most impact -- in futures that are totally doomed, it's worthless to try anything, and in other futures that go absurdly well it's similarly unimportant to contribute our own efforts. Just in the same way that our votes are more impactful when we vote in a very close election, our actions to advance AI alignment are most impactful in the scenarios balanced on a knife's edge between survival and disaster.

(I think that is the right logic for your altruistic, AI safety research efforts anyways. If you are making personal plans, like deciding whether to have children or how much to save for retirement, that's a different case with different logic to it.)

[-]Charlie Steiner3y77

I agree that this is accurate but worry that it doesn't help the sort of person who wants just one future to put more weight on. What futures count as marginal depend on the strategy you're considering, and on what actions you expect other people to take - you can't just find some concrete future that is "the marginal future," and only take actions that affect that one future.

If you want to avoid the computational burden of consequentialism, rather than focusing on just one future I think a solid recommendation is the virtue-ethical death with dignity strategy.

[-]Zach Stein-Perlman3y51

Neither

Many factors are relevant to which possible futures you should upweight. For example, the following are all reasons to pay more attention to a possible set of futures (where a "possible set of futures" could be characterized by "AGI in 2050" or any other condition):

They're more likely
They're more tractable
- Because you see them more clearly (related: important events occur sooner, short-timelines)
- Because other actors won't be paying attention around important events (related: important events occur sooner, short-timelines)
- Because you'll have more influence in them
- Because P(doom) is closer to 50%

(Also take into account future research– for example, if you focus on the world in 2030 (or assume that human-level AI is developed in 2030) you can be deferring, not neglecting, work on 2040.)

[-]Jeffrey Ladish3y20

I sort of agree with this abstractly and disagree on practice. I think we're just very limited in what kinds of circumstances we can reasonably estimate / guess at. Even the above claim, "a big proportion of worlds where we survived, AGI probably gets delayed" is hard to reason about.

But I do kind of need the know the timescale I'm operating in when thinking about health and money and skill investments, etc. so I think you need to reason about it somehow.

[-]Zach Stein-Perlman3y*42

If you're just taking into account P(AGI in year t) and P(doom | AGI in year t), I think you should weight by probability times leverage. So weight AGI in year t by P(AGI in year t) * (P(doom | AGI in year t) - P(doom | AGI in year t)^2).

Certainly ignoring P(doom) is wrong, and certainly the asymmetry where you condition on success is wrong (conversely: why not condition on alignment failure because those are the worlds that need you to work on them) (or: notice that you're giving most weight to the worlds with lowest P(doom), when a world with extremely low P(doom) doesn't need you much in expectation; you have more influence over worlds with P(doom) close to 50%), it seems to me.

[-]JustinShovelain3y30

Roughly speaking, in terms of the actions you take, various timelines should be weighted as P(AGI in year t)*DifferenceYouCanProduceInAGIAlignmentAt(t). This produces a new, non normalized distribution of how much to prioritize each time (you can renormalize it if you wish to make it more like "probability").

Note that this is just a first approximation and there are additional subtleties.

This assumes you are optimizing for each time and possible world orthogonality but much of the time optimizing for nearby times is very similar to optimizing for a particular time.
The definition of "you" here depends on the nature of the decision maker which can vary between a group, a person, or even a person at a particular moment.
Using different definitions of "you" between decision makers can cause a coordination issue where different people are trying to save different potential worlds (because of their different skills and ability to produce change) and their plans may tangle with each other.
It is difficult to figure out how much of a difference you can produce in different possible worlds and times. You do the best you can but you might suffer a failure of imagination in either finding ways your plans wont work, ways your plans will have larger positive effects, or ways you may in the future improve your plans. For more on the difference one can produce see this and this.
Lastly, there is a risk here psychologically and socially of fudging the calculations above to make things more comfortable.

(Meta: I may make a full post on this someday and use this reasoning often)

[-]James L3y21

Why are you using your median timeline | success? Maybe I missed it, but I don't see your reason explained in the post.

[-][anonymous]3y0-13

Strong downvoted for the emojis.

[-]Jeffrey Ladish3y102

Why did you do that?

[-][anonymous]3y34

It's like writing a clickbait title -- they add clutter and noise for no benefit, and I want to discourage them.

^{^}

I think the time bought by solving AI alignment in a limited way & using that to buy time, compared to the time obtained through human coordination efforts, is more likely to be a greater proportion of the time in the median world where we eventually solve alignment. However, I also think my own efforts are less important (though potentially still important) in the use-AI-to-buy-time world. So it's hard to know how to weight it, so I'm not distinguishing much between these types of additional time right now.

LESSWRONG
LW

LESSWRONG
LW

25

When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️

25

25