Zach Stein-Perlman's Shortform

5 comments, sorted by Highlighting new comments since Today at 8:12 AM
New Comment

Value Is Binary

Epistemic status: rough ethical and empirical heuristic.

Assuming that value is roughly linear in resources available after we reach technological maturity,[1] my probability distribution of value is so bimodal that it is nearly binary. In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else.[2] To the extent that almost all of the probability mass fits into two buckets, and everything within a bucket is almost equally valuable as everything else in that bucket, the goal maximize expected value reduces to the goal maximize probability of the better bucket.

So rather than thinking about how to maximize expected value, I generally think about maximizing the probability of a great (i.e., near-optimal) future. This goal is easier for me to think about, particularly since I believe that the paths to a great future are rather homogeneous — alike not just in value but in high-level structure. In the rest of this shortform, I explain my belief that the future is likely to be near-optimal or near-zero.

 

Substantial probability to near-optimal futures.

I have substantial credence that the future is at least 99% as good as the optimal future.[3] I do not claim much certainty about what the optimal future looks like — my baseline assumption is that it involves increasing and improving consciousness in the universe, but I have little idea whether that would look like many very small minds or a few very big minds. Or perhaps the optimal future involves astronomical-scale acausal trade. Or perhaps future advances in ethics, decision theory, or physics will have unforeseeable implications for how a technologically mature civilization can do good.

But uniting almost all of my probability mass for near-optimal futures is how we get there, at a high level: we create superintelligence, achieve technological maturity, solve ethics, and then optimize. Without knowing what this looks like in detail, I assign substantial probability to the proposition that humanity successfully completes this process. And I think almost all futures in which we do complete this process look very similar: they have nearly identical technology, reach the same conclusions on ethics, have nearly identical resources available to them (mostly depending on how long it took them to reach maturity), and so produce nearly identical value.

 

Almost all of the remaining probability to near-zero futures.

This claim is bolder, I think. Even if it seems reasonable to expect a substantial fraction of possible futures to converge to near-optimal, it may seem odd to expect almost all of the rest to be near-zero. But I find it difficult to imagine any other futures.

For a future to not be near-zero, it must involve using a nontrivial fraction of the resources available in the optimal future (by my assumption that value is roughly linear in resources). More significantly, the future must involve using resources at a nontrivial fraction of the efficiency of their use in the optimal future. This seems unlikely to happen by accident. In particular, I claim:

If a future does not involve optimizing for the good, value is almost certainly near-zero.

Roughly, this holds if all (nontrivially efficient) ways of promoting the good are not efficient ways of optimizing for anything else that we might optimize for. I strongly intuit that this is true; I expect that as technology improves, efficiently producing a unit of something will produce very little of almost all other things (where "thing" includes not just stuff but also minds, qualia, etc.).[4] If so, then value (or disvalue) is (in expectation) a negligible side effect of optimization for other things. And I cannot reasonably imagine a future optimized for disvalue, so I think almost all non-near-optimal futures are near-zero.

 

So I believe that either we optimize for value and get a near-optimal future, or we do anything else and get a near-zero future.

Intuitively, it seems possible to optimize for more than one value. I think such scenarios are unlikely. Even if our utility function has multiple linear terms, unless there is some surprisingly good way to achieve them simultaneously, we optimize by pursuing one of them near-exclusively.[5] Optimizing a utility function that looks more like min(x,y) may be a plausible result of a grand bargain, but such a scenario requires that, after we have mature technology, multiple agents have nontrivial bargaining power and different values. I find this unlikely; I expect singleton-like scenarios and that powerful agents will either all converge to the same preferences or all have near-zero-value preferences.

 

I mostly see "value is binary" as a heuristic for reframing problems. It also has implications for what we should do: to the extent that value is binary (and to the extent that doing so is feasible), we should focus on increasing the probability of great futures. If a "catastrophic" future is one in which we realize no more than a small fraction of our value, then a great future is simply one which is not catastrophic and we should focus on avoiding catastrophes. But of course, "value is binary" is an empirical approximation rather than an a priori truth. Even if value seems very nearly binary, we should not reject contrary proposed interventions[6] or possible futures out of hand.

I would appreciate suggestions on how to make these ideas more formal or precise (in addition to comments on what I got wrong or left out, of course). Also, this shortform relies on argument by "I struggle to imagine"; if you can imagine something I cannot, please explain your scenario and I will justify my skepticism or update.


  1. You would reject this if you believed that astronomical-scale goods are not astronomically better than Earth-scale goods or if you believed that some plausible Earth-scale bad would be worse than astronomical-scale goods are good. ↩︎

  2. "Optimal" value is roughly defined as the expected value of the future in which we act as well as possible, from our current limited knowledge about what "acting well" looks like. "Zero" is roughly defined as any future in which we fail to do anything astronomically significant. I consider value relative to the optimal future, ignoring uncertainty about how good the optimal future is — we should theoretically act as if we're in a universe with high variance in value between different possibilities, but I don't see how this affects what we should choose before reaching technological maturity.*
    *Except roughly that we should act with unrealistically low probability that we are in a kind of simulation in which our choices matter very little or have very differently-valued consequences than otherwise. The prospect of such simulations might undermine my conclusions—value might still be binary, but for the wrong reason—so it is useful to be able to almost-ignore such possibilities. ↩︎

  3. That is, at least 99% of the way from the zero-value future to the optimal future. ↩︎

  4. If we particularly believe that value is fragile, we have an additional reason to expect this orthogonality. But I claim that different goals tend to be orthogonal at high levels of technology independent of value's fragility. ↩︎

  5. This assumes that all goods are substitutes in production, which I expect to be nearly true with mature technology. ↩︎

  6. That is, those that affect the probability of futures outside the binary or that affect how good the future is within the set of near-zero (or near-optimal) futures out of hand. ↩︎

After reading the first paragraph of your above comment only, I want to note that:

In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else.

I assign much lower probability to near-optimal futures than near-zero-value futures.

This is mainly because I imagine a lot of the "extremely good" possible worlds I imagine when reading Bostrom's Letter from Utopia are <1% of what is optimal.

I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.

(I'd like to read the rest of your comment later (but not right now due to time constraints) to see if it changes my view.)

I agree that near-optimal is unlikely. But I would be quite surprised by 1%-99% futures because (in short) I think we do better off we optimize for good and do worse if we don’t. If our final use of our cosmic endowment isn’t near-optimal, I think we failed to optimize for good and would be surprised if it’s >1%.

Related idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome.

Binaries Are Analytically Valuable

Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have

  • really solved alignment (i.e., we can create an aligned AI capable of triggering an intelligence explosion at reasonable extra cost and delay) or
  • really not solved alignment (i.e., we cannot create a similarly powerful aligned AI, or doing so would require very unreasonable extra cost and delay)

(Whether this is actually true is irrelevant to my point.)

Why would this matter?

Stating the risk from an unaligned intelligence explosion is kind of awkward: it's that the alignment tax is greater than what the leading AI project is able/willing to pay. Equivalently, our goal is for the alignment tax to be less than what the leading AI project is able/willing to pay. This gives rise to two nice, clean desiderata:

  • Decrease the alignment tax
  • Increase what the leading AI project is able/willing to pay for alignment

But unfortunately, we can't similarly split the goal (or risk) into two goals (or risks). For example, a breakdown into the following two goals does not capture the risk from an unaligned intelligence explosion:

  • Make the alignment tax less than 6 months and a trillion dollars
  • Make the leading AI project able/willing to spend 6 months and a trillion dollars on aligning an AI

It would suffice to achieve both of these goals, but doing so is not necessary. If we fail to reduce the alignment tax this far, we can compensate by doing better on the willingness-to-pay front, and vice versa.

But if alignment success is binary, then we actually can decompose the goal (bolded above) into two necessary (and jointly sufficient) conditions:

  • Really solve alignment; i.e., reduce the alignment tax to [reasonable value]
  • Make the leading AI project able/willing to spend [reasonable value] on alignment

(Where [reasonable value] depends on what exactly our binary-ish probability distribution for alignment success looks like.)

Breaking big goals down into smaller goals—in particular, into smaller necessary conditions—is valuable, analytically and pragmatically. Binaries help, when they exist. Sometimes weaker conditions on the probability distribution, those of the form a certain important subset of possibilities has very low probability, can be useful in the same way.

Maybe AI Will Happen Outside US/China

I'm interested in the claim important AI development (in the next few decades) will largely occur outside any of the states that currently look likely to lead AI development. I don't think this is likely, but I haven't seen discussion of this claim.[1] This would matter because it would greatly affect the environment in which AI is developed and affect which agents are empowered by powerful AI.

Epistemic status: brainstorm. May be developed into a full post if I learn or think more.

 

I. Causes

The big tech companies are in the US and China, and discussion often assumes that these two states have a large lead on AI development. So how could important development occur in another state? Perhaps other states' tech programs (private or governmental) will grow. But more likely, I think, an already-strong company leaves the US for a new location.

My legal knowledge is insufficient to say how well companies can leave their states with any confidence. My impression is that large American companies largely can leave while large Chinese companies cannot.

Why might a big tech company or AI lab want to leave a state?[2]

  • Fleeing expropriation/nationalization. States can largely expropriate companies' property within their territory unless they have contracted otherwise. A company may be able to protect its independence by securing legal protection from expropriation from another state, then moving its hardware to that state. It may move its headquarters or workers as well.
  • Fleeing domestic regulation on development and/or deployment of AI.

 

II. Effects

The state in which powerful AI is developed has two important effects.

  1. States set regulations. The regulatory environment around an AI lab may affect the narrow AI systems it builds and/or how it pursues AGI.
  2. State influence & power. The state in which AGI is achieved can probably nationalize that project (perhaps well before AGI). State control of powerful AI affects how it will be used.

 

III. AI deployment before superintelligence

Eliezer recently tweeted that AI might be low-impact until superintelligence because of constraints on deployment. This seems partially right — for example, medicine and education seem like areas in which marginal improvements in our capabilities have only small effects due to civilizational inadequacy. Certainly some AI systems would require local regulatory approval to be useful; those might well be limited in the US. But a large fraction of AI systems won't be prohibited by plausible American regulation. For example, I would be quite surprised if the following kinds of systems were prohibited by regulation (disclaimer: I'm very non-expert on near-future AI):

  • Business services
    • Operations/logistics
    • Analysis
    • Productivity tools (e.g., Codex, search tools)
  • Online consumer services — financial, writing assistants (Codex)
  • Production of goods that can be shipped cheaply (like computers but not houses)
  • Trading
  • Maybe media stuff (chatbots, persuasion systems). It's really hard to imagine the US banning chatbots. I'm not sure how persuasion-AI is implemented; custom ads could conceivably be banned, but eliminating AI-written media is implausible.

This matters because these AI applications directly affect some places even if they couldn’t be developed in those places.

In the unlikely event that the US moves against not only the deployment but also the development of such systems, AI companies would be more likely to seek a way around regulation — such as relocating.


  1. Rather, I have not seen reasons for this claim other than the very normal one — that leading states and companies change over time. If you have seen more discussion of this claim, please let me know. ↩︎

  2. This is most likely to be relevant to the US but applies generally. ↩︎