This is a special post for quick takes by Zach Stein-Perlman. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

New to LessWrong?

29 comments, sorted by Click to highlight new comments since: Today at 8:01 AM

Slowing AI: Bad takes

This shortform was going to be a post in Slowing AI but its tone is off.

This shortform is very non-exhaustive.

Bad take #1: China won't slow, so the West shouldn't either

There is a real consideration here. Reasonable variants of this take include

  • What matters for safety is not just slowing but also the practices of the organizations that build powerful AI. Insofar as the West is safer and China won't slow, it's worth sacrificing some Western slowing to preserve Western lead.
  • What matters for safety is not just slowing but especially slowing near the end. Differentially slowing the West now would reduce its ability to slow later (or even cause it to speed later). So differentially slowing the West now is bad.

(Set aside the fact that slowing the West generally also slows China, because they're correlated and because ideas pass from the West to China.) (Set aside the question of whether China will try to slow and how correlated that is with the West slowing.)

In some cases slowing the West would be worth burning lead time. But slowing AI doesn't just mean the West slowing itself down. Some interventions would slow both spheres similarly or even differentially slow China– most notably export controls, reducing diffusion of ideas, and improved migration policy.

See West-China relation.

 

Bad take #2: slowing can create a compute overhang, so all slowing is bad

Taboo "overhang."

Yes, insofar as slowing now risks speeding later, we should notice that. There is a real consideration here.

But in some cases slowing now would be worth a little speeding later. Moreover, some kinds of slowing don't cause faster progress later at all: for example, reducing diffusion of ideas, decreasing hardware progress, and any stable and enforceable policy regimes that slow AI.

See Quickly scaling up compute.

 

Bad take #3: powerful AI helps alignment research, so we shouldn't slow it

(Set aside the question of how much powerful AI helps alignment research.) If powerful AI is important for alignment research, that means we should aim to increase time with powerful AI, not how soon powerful AI appears.

 

Bad take #4: it would be harder for unaligned AI to take over in a world with less compute available (for it to hijack), and failed takeover attempts would be good, so it's better for unaligned AI to try to take over soon

No, running AI systems seems likely to be cheap and there's already plenty of compute.

Some bad default/attractor properties of cause-focused groups of humans:

  • Bad group epistemcis
    • The loudest aren't the most worth listening to
    • People don't dissent even when that would be correct
      • Because they don't naturally even think to challenge group assumptions
      • Because dissent is punished
  • Bad individual epistemics
  • Soldier mindset
    • Excessively focusing on persuasion and how to persuade, relative to understanding the world
    • Lacking vibes like curiosity is cool and changing your mind is cool
  • Feeling like a movement
    • Excessively focusing on influence-seeking for the movement
    • Having enemies and being excessively adversarial to them
    • (Signs/symptoms are people asking the group "what do we believe" and excessive fixation on what defines the group or distinguishes it from adjacent groups)
  • Having group instrumental-goals or plans or theories-of-victory that everyone is supposed to share (to be clear I think it's often fine for a group to share ~ultimate goals but groups often fixate on particular often-suboptimal paths to achieving those goals)
    • Choosing instrumental goals poorly and not revising them
  • Excessive fighting over status/leadership (maybe, sometimes)
  • Maybe... being bad at achieving goals, or bad instrumental rationality (group and individual)
  • Maybe something weird about authority...

(I'm interested in readings on this topic.)

Propositions on SIA

Epistemic status: exploring implications, some of which feel wrong.

  1. If SIA is correct, you should update toward the universe being much larger than it naively (i.e. before anthropic considerations) seems, since there are more (expected) copies of you in larger universes.
    1. In fact, we seem to have to update to probability 1 on infinite universes; that's surprising.
  2. If SIA is correct, you should update toward there being more alien civilizations than it naively seems, since in possible-universes where more aliens appear, more (expected) copies of human civilization appear.
    1. The complication is that more alien civilizations makes it more likely that an alien causes you to never have existed, e.g. by colonizing the solar system billions of years ago. So a corollary is that you should update toward human-level civilizations being less likely to be "loud" or tending to affect fewer alien civilizations or something than it naively seems.
      1. So SIA predicts that there were few aliens in the early universe and many aliens around now.
      2. So SIA predicts that human-level civilizations (more precisely, I think: civilizations whose existence is correlated with your existence) tend not to noticeably affect many others (whether due to their capabilities, motives, or existential catastrophes).
      3. So SIA retrodicts there being a speed limit (the speed of light) and moreover predicts that noticeable-influence-propagation in fact tends to be even slower.
  3. If SIA is correct, you should update toward the proposition that you live in a simulation, relative to your naive credence.
    1. Because there could be lots more (expected) copies of you in simulations.
    2. Because that can explain why you appear to exist billions of years after billions of alien civilizations could have reached Earth.

Which of them feel wrong to you? I agree with all them other than 3b, which I'm unsure about - I think it this comment does a good job at unpacking things. 

2a is Katja Grace's Doomsday argument. I think 2aii and 2aiii depends on whether we're allowing simulations; if faster expansion speed (either the cosmic speed limit or engineering limit on expansion) meant more ancestor simulations then this could cancel out the fact that faster expanding civilizations prevent more alien civilizations coming in to existence.

I deeply sympathize with the presumptuous philosopher but 1a feels weird.

2a was meant to be conditional on non-simulation.

Actually putting numbers on 2a (I have a post on this coming soon), the anthropic update seems to say (conditional on non-simulation) there's almost certainly lots of aliens all of which are quiet, which feels really surprising.

To clarify what I meant on 3b: maybe "you live in a simulation" can explain why the universe looks old better than "uh, I guess all of the aliens were quiet" can.

I deeply sympathize with the presumptuous philosopher but 1a feels weird.

Yep! I have the same intuition

Actually putting numbers on 2a (I have a post on this coming soon), the anthropic update seems to say (conditional on non-simulation) there's almost certainly lots of aliens all of which are quiet, which feels really surprising.

Nice! I look forward to seeing this. I did similar analysis - both considering SIA + no simulations and SIA + simulations in my work on grabby aliens

AI endgame

In board games, the endgame is a period generally characterized by strategic clarity, more direct calculation of the consequences of actions rather than evaluating possible actions with heuristics, and maybe a narrower space of actions that players consider.

Relevant actors, particularly AI labs, are likely to experience increased strategic clarity before the end, including AI risk awareness, awareness of who the leading labs are, roughly how much lead time they have, and how threshold-y being slightly ahead is.

There may be opportunities for coordination in the endgame that were much less incentivized earlier, and pausing progress may be incentivized in the endgame despite being disincentivized earlier (from an actor's imperfect perspective and/or from a maximally informed/wise/etc. perspective).

The downside of "AI endgame" as a conceptual handle is that it suggests thinking of actors as opponents/adversaries. Probably "crunch time" is better, but people often use that to gesture at hinge-y-ness rather than strategic clarity.

I wouldn't take board games as a reference class but rather war or maybe elections. I'm not sure in these cases you have more clarity towards the end.

For example, if a lab is considering deploying a powerful model, it can prosocially show its hand--i.e., demonstrate that it has a powerful model--and ask others to make themselves partially transparent too. This affordance doesn't appear until the endgame. I think a refined version of it could be a big deal.

That could be, but also maybe there won't be a period of increased strategic clarity. Especially if the emergence of new capabilities with scale remains unpredictable, or if progress depends on finding new insights.

I can't think of many games that don't have an endgame. These examples don't seem that fun:

  • A single round of musical chairs.
  • A tabletop game that follows an unpredictable, structureless storyline.

Agree. I merely assert that we should be aware of and plan for the possibility of increased strategic clarity, risk awareness, etc. (and planning includes unpacking "etc.").

Probably taking the analogy too far, but: most games-that-can-have-endgames also have instances that don't have endgames; e.g. games of chess often end in the midgame.

AI strategy research projects project generators prompts

Mostly for personal use. Likely to be expanded over time.

Some prompts inspired by Framing AI strategy:

  • Plans
    • What plans would be good?
    • Given a particular plan that is likely to be implemented, what interventions or desiderata complement that plan (by making it more likely to succeed or by being better in worlds where it succeeds)
  • Affordances: for various relevant actors, what strategically significant actions could they take? What levers do they have? What would it be great for them to do (or avoid)?
  • Intermediate goals: what goals or desiderata are instrumentally useful?
  • Threat modeling: for various threats, model them well enough to understand necessary and sufficient conditions for preventing them.
  • Memes (& frames): what would it be good if people believed or paid attention to?

For forecasting prompts, see List of uncertainties about the future of AI.

Some miscellaneous prompts:

  • Slowing AI
    • How can various relevant actors slow AI?
    • How can the AI safety community slow AI?
    • What considerations or side effects are relevant to slowing AI?
  • How do labs act, as a function of [whatever determines that]? In particular, what's the deal with "racing"?
  • AI risk advocacy
    • How could the AI safety community do AI risk advocacy well?
    • What considerations or side effects are relevant to AI risk advocacy? 
  • What's the deal with crunch time?
  • How will the strategic landscape be different in the future?
  • What will be different near the end, and what interventions or actions will that enable? In particular, is eleventh-hour coordination possible? (Also maybe emergency brakes that don't appear until the end.)
  • What concrete asks should we have for labs? for government?
  • Meta: how can you help yourself or others do better AI strategy research?

Four kinds of actors/processes/decisions are directly very important to AI governance:

  • Corporate self-governance
    • Adopting safety standards
      • Proving a model for government regulation
  • US policy (and China, EU, UK, and others to a lesser extent)
    • Regulation
    • Incorporating standards into law
  • Standard-setters setting standards
  • International relations
    • Treaties
    • Informal influence on safety standards

Related: How technical safety standards could promote TAI safety.

("Safety standards" sounds prosaic but it doesn't have to be.)

AI risk decomposition based on agency or powerseeking or adversarial optimization or something

Epistemic status: confused.

Some vague, closely related ways to decompose AI risk into two kinds of risk:

  • Risk due to AI agency vs risk unrelated to agency
  • Risk due to AI goal-directedness vs risk unrelated to goal-directedness
  • Risk due to AI planning vs risk unrelated to planning
  • Risk due to AI consequentialism vs risk unrelated to consequentialism
  • Risk due to AI utility-maximization vs risk unrelated to utility-maximization
  • Risk due to AI powerseeking vs risk unrelated to powerseeking
  • Risk due to AI optimizing against you vs risk unrelated to adversarial optimization

The central reason to worry about powerseeking/whatever AI, I think, is that sufficiently (relatively) capable goal-directed systems instrumentally converge to disempowering you.

The central reason to worry about non-powerseeking/whatever AI, I think, is failure to generalize correctly from training-- distribution shift, Goodhart, You get what you measure.

What's the relationship between the propositions "one AI lab [has / will have] a big lead" and "the alignment tax will be paid"? (Or: in a possible world where lead size is bigger/smaller, how does this affect whether the alignment tax is paid?)

It depends on the source of the lead, so "lead size" or "lead time" is probably not a good node for AI forecasting/strategy.

Miscellaneous observations:

  • To pay the alignment tax, it helps to have more time until risky AI is developed or deployed.
  • To pay the alignment tax, holding total time constant, it helps to have more time near the end-- that is, more time with near-risky capabilities (for knowing what risky AI systems will look like, and for empirical work, and for aligning specific models).
  • If all labs except the leader become slower or less capable, it is prima facie good (at least if the leader appreciates misalignment risk and will stop before developing/deploying risky AI).
  • If the leading lab becomes faster or more capable, it is prima facie good (at least if the leader appreciates misalignment risk and will stop before developing/deploying risky AI), unless it causes other labs to become faster or more capable (for reasons like seeing what works or seeming to be straightforwardly incentivized to speed up or deciding to speed up in order to influence the leader). Note that this scenario could plausibly decrease race-y-ness: some models of AI racing show that if you're far behind you avoid taking risks, kind of giving up; this is based on the currently-false assumption that labs are perfectly aware of misalignment risk.
  • If labs all coordinate to slow down, that's good insofar as it increases total time, and great if they can continue to go slowly near the end, and potentially bad if it creates a hardware overhang such that the end goes more quickly than by default.

(Note also the distinction between current lead and ultimate lead. Roughly, the former is what we can observe and the latter is what we care about.)

(If paying the alignment tax looks less like a thing that happens for one transformative model and more like something that occurs gradually in a slow takeoff to avert Paul-style doom, things are more complex and in particular there are endogeneities such that labs may have additional incentives to pursue capabilities.)

there should be no alignment tax because improved alignment should always pay for itself, right? but currently "aligned" seems to be defined by "tries to not do anything", institutionally. Why isn't anthropic publicly competing on alignment with openai? eg folks are about to publicly replicate chatgpt, looks like.

List of uncertainties about the future of AI

This is an unordered list of uncertainties about the future of AI, trying to be comprehensive– trying to include everything reasonably decision-relevant and important/tractable.

This list is mostly written from a forecasting perspective. A useful complementary perspective to forecasting would be strategy or affordances or what actors can do and what they should do. This list is also written from a nontechnical perspective.

  • Timelines
    • Capabilities as a function of inputs (or input requirements for AI of a particular capability level)
    • Spending by leading labs
    • Cost of compute
    • Ideas and algorithmic progress
    • Endogeneity in AI capabilities
    • What would be good? Interventions?
  • Takeoff (speed and dynamics)
    • dcapabilities/dinputs (or returns on cognitive reinvestment or intelligence explosion or fast recursive self-improvement): Is there a threshold of capabilities such that self-improvement or other progress is much greater slightly above that point than slightly below it? (If so, where is it?) Will there be a system that can quickly and cheaply improve itself (or create a more capable successor), such that the improvements enable similarly large improvements, and so on until the system is much more capable? Will a small increase in inputs cause a large increase in capabilities (like the difference between chimps and humans) (and if so, around human-level capabilities or where)? How fast will progress be?
      • Will dcapabilities/dinputs be very high because of recursive self-improvement?
      • Will dcapabilities/dinputs be very high because of generality being important and monolithic/threshold-y?
      • Will dcapabilities/dinputs be very high because ideas are discrete (and in particular, will there be a single "secret sauce" idea)? [Seems intractable and unlikely to be very important.]
    • dimpacts/dcapabilities: Will small increases in capabilities (on the order of the variance between different humans) cause large increases in impacts?
    • Qualitatively, how will AI research capabilities affect AI progress; what will AI progress look like when AI research capabilities are a big deal?
      • One dimension or implication: will takeoff be local or nonlocal? How distributed will it be; will it look like recursive self-improvement or the industrial revolution?
        • What does this depend on?
    • Endogeneity in AI capabilities: how do AI capabilities affect AI capabilities? (Potential alternative framing: dinputs/dtime.)
      • How (much) will AI tools accelerate research?
      • Will AI labs generate substantial revenue?
      • Will AI systems make AI seem more exciting or frightening? What effect would this have on AI progress?
      • What other endogeneities exist?
  • Weak AI
    • How will the strategic landscape be different in the future?
    • Will weak AI make AI risk clearer, and especially make it more legible (relates to "warning shots")?
    • Will AI progress cause substantial misuse or conflict? What would that look like?
    • Will AI progress enable pivotal acts or processes?
  • Misalignment risk sources/modes
  • Technical problems around AI systems doing what their controllers want [doesn't fit into this list well]
  • Polarity (relates to takeoff, endogeneity, and timelines)
    • What determines or affects polarity?
    • What are the effects or implications of polarity on alignment and stabilization?
    • What are the effects or implications of polarity on what the long-term future looks like, conditional on achieving alignment and stabilization?
    • What would be good? Interventions?
  • Proximate and ultimate uses of powerful AI
    • What uses of powerful AI would be great? How good would various possible uses of powerful AI be?
    • Conditional on achieving alignment, what's likely to occur (and what could foresighted actors cause to occur or not to occur)?
  • Agents vs tools and general systems vs narrow tools (relates to tool AI and Comprehensive AI Services)
    • Are general systems more powerful than similar narrow tools?
    • Are agents more powerful than similar tools?
    • Does generality appear by default in capable systems? (This relates to takeoff.)
    • Does agency (or goal-directedness or consequentialism or farsightedness) appear by default in capable systems?
  • AI labs' behavior and racing for AI
    • How will labs think about AI?
    • What actions could labs perform; what are they likely to do by default?
    • What would it be better if labs did; what interventions are tractable?
  • States' behavior
    • How will states think about AI?
    • What actions could states perform? What are they likely to do by default?
    • What would it be better if states did; what interventions are tractable?
  • Public opinion
    • How the public thinks about AI and framing; what the public thinks about AI, what memes would spread widely and how that depends on other facts about the world; how all of that translates into attitudes and policy preferences
    • Wakeup to capabilities
    • Wakeup to alignment risk and warning shots for alignment
    • What (facts about public opinion) would be good? Interventions?
  • Paths to powerful AI (relates to timelines, AI risk modes, and more)
  • Meta and miscellanea
    • Epistemic stuff
      • Research methodology and organization: how to do research and organize researchers
      • Forecasting methodology: how to do forecasting better
      • Collective epistemology: how to share and aggregate beliefs and knowledge
    • Decision theory
    • Simulation hypothesis
      • Do we live in a simulation? (If so, what's the deal?)
      • What should we do if we live in a simulation (as a function of what the deal is)?
    • Movement/community/field stuff

Maybe this list would be more useful if it had more pointers to relevant work?

Maybe this list would be more useful if it included stuff that's important that I don't feel uncertain about? But probably not much of that exists?

I like lists/trees/graphs. I like the ideas behind Clarifying some key hypotheses in AI alignment and Modelling Transformative AI Risks. Perhaps this list is part of the beginning of a tree/graph for AI forecasting not including alignment stuff.

Meta level. To carve nature at its joints, we must [use good nodes / identify the true nodes]. A node is [good insofar as / true if] its causes and effects are modular, or we can losslessly compress phenomena related to it into effects on it and effects from it.

"The cost of compute" is an example of a great node (in the context of the future of AI): it's affected by various things (choices made by Nvidia, innovation, etc.), and it affects various things (capability-level of systems made by OpenAI, relative importance of money vs talent at AI labs, etc.), and we lose nothing by thinking in terms of the cost of compute (relative to, e.g., the effects of the choices made by Nvidia on the capability-level of systems made by OpenAI).

"When Moore's law will end" is an example of something that is not a node (in the context of the future of AI), since you'd be much better off thinking in terms of the underlying causes and effects.

The relations relevant to nodes are analytical not causal. For example, "the cost of compute" is a node between "evidence about historical progress" and "timelines," not just between "stuff Nvidia does" and "stuff OpenAI does." (You could also make a causal model, but here I'm interested in analytical models.)

 

Object level. I'm not sure how good "timelines," "takeoff," "polarity," and "wakeup to capabilities" are as nodes. Most of the time it seems fine to talk about e.g. "effects on timelines" and "implications of timelines." But maybe this conceals confusion.

Biological bounds on requirements for human-level AI

Facts about biology bound requirements for human-level AI. In particular, here are two prima facie bounds:

  • Lifetime. Humans develop human-level cognitive capabilities over a single lifetime, so (assuming our artificial learning algorithms are less efficient than humans' natural learning algorithms) training a human-level model takes at least the inputs used over the course of babyhood-to-adulthood.
  • Evolution. Evolution found human-level cognitive capabilities by blind search, so (assuming we can search at least that well, and assuming evolution didn't get lucky) training a human-level model takes at most the inputs used over the course of human evolution (plus Lifetime inputs, but that's relatively trivial).
    • Genome. The size of the human genome is an upper bound on the complexity of humans' natural learning algorithms. Training a human-level model takes at most the inputs needed to find a learning algorithm at most as complex as the human genome (plus Lifetime inputs, but that's relatively trivial). (Unfortunately, the existence of human-level learning algorithms of certain simplicity says almost nothing about the difficulty of finding such algorithms.) (Ajeya's "genome anchor" is pretty different—"a transformative model would . . . have about as many parameters as there are bytes in the human genome"—and makes no sense to me.)

(A human-level AI should use similar computation as humans per subjective time. This assumption/observation is weird and perhaps shows that something weird is going on, but I don't know how to make that sharp.)

There are few sources of bounds on requirements for human-level AI. Perhaps fundamental limits or reasoning about blind search could give weak bounds, but biology is the only example of human-level cognitive abilities and so the only possible source of reasonable bounds.

Related: Ajeya Cotra's Forecasting TAI with biological anchors (most relevant section) and Eliezer Yudkowsky's Biology-Inspired AGI Timelines: The Trick That Never Works.

What do people (outside this community) think about AI? What will they think in the future?

Attitudes predictably affect relevant actors' actions, so this is a moderately important question. And it's rather neglected.

Groups whose attitudes are likely to be important include ML researchers, policymakers, and the public.

On attitudes among the public, surveys provide some information, but I suspect attitudes will change (in potentially predictable ways) as AI becomes more salient and some memes/framings get locked in. Perhaps some survey questions (maybe general sentiment on AI) are somewhat robust to changes in memes while others (maybe beliefs on how AI affects the economy or attitudes on regulation) may change a lot in the near future.

On attitudes among ML researchers, surveys (e.g.) provide some information, but for some reason most ML researchers say there's at least a 5% probability of doom (or 10%, depending on how you ask) but this doesn't seem to translate into their actions or culture. Perhaps interviews would reveal researchers' attitudes better than closed-ended surveys (note to self: talk to Vael Gates).

AI may become much more salient in the next few years, and memes/framings may get locked in.

On attitudes among ML researchers, surveys (e.g.) provide some information, but for some reason most ML researchers say there's at least a 5% probability of doom (or 10%, depending on how you ask) but this doesn't seem to translate into their actions or culture. Perhaps interviews would reveal researchers' attitudes better than closed-ended surveys (note to self: talk to Vael Gates).

Critically, this only is necessary if we assume that researchers care about basically everyone in the present (to a loose approximation.) If we instead model researchers as basically selfish by default, then the low chance of a technological singularity outweighs the high chance of death, especially for older folks.

Basically, this could be explained as a goal alignment problem: LW and AI Researchers have very different goals in mind.

Maybe AI Will Happen Outside US/China

I'm interested in the claim important AI development (in the next few decades) will largely occur outside any of the states that currently look likely to lead AI development. I don't think this is likely, but I haven't seen discussion of this claim.[1] This would matter because it would greatly affect the environment in which AI is developed and affect which agents are empowered by powerful AI.

Epistemic status: brainstorm. May be developed into a full post if I learn or think more.

 

I. Causes

The big tech companies are in the US and China, and discussion often assumes that these two states have a large lead on AI development. So how could important development occur in another state? Perhaps other states' tech programs (private or governmental) will grow. But more likely, I think, an already-strong company leaves the US for a new location.

My legal knowledge is insufficient to say how well companies can leave their states with any confidence. My impression is that large American companies largely can leave while large Chinese companies cannot.

Why might a big tech company or AI lab want to leave a state?[2]

  • Fleeing expropriation/nationalization. States can largely expropriate companies' property within their territory unless they have contracted otherwise. A company may be able to protect its independence by securing legal protection from expropriation from another state, then moving its hardware to that state. It may move its headquarters or workers as well.
  • Fleeing domestic regulation on development and/or deployment of AI.

 

II. Effects

The state in which powerful AI is developed has two important effects.

  1. States set regulations. The regulatory environment around an AI lab may affect the narrow AI systems it builds and/or how it pursues AGI.
  2. State influence & power. The state in which AGI is achieved can probably nationalize that project (perhaps well before AGI). State control of powerful AI affects how it will be used.

 

III. AI deployment before superintelligence

Eliezer recently tweeted that AI might be low-impact until superintelligence because of constraints on deployment. This seems partially right — for example, medicine and education seem like areas in which marginal improvements in our capabilities have only small effects due to civilizational inadequacy. Certainly some AI systems would require local regulatory approval to be useful; those might well be limited in the US. But a large fraction of AI systems won't be prohibited by plausible American regulation. For example, I would be quite surprised if the following kinds of systems were prohibited by regulation (disclaimer: I'm very non-expert on near-future AI):

  • Business services
    • Operations/logistics
    • Analysis
    • Productivity tools (e.g., Codex, search tools)
  • Online consumer services — financial, writing assistants (Codex)
  • Production of goods that can be shipped cheaply (like computers but not houses)
  • Trading
  • Maybe media stuff (chatbots, persuasion systems). It's really hard to imagine the US banning chatbots. I'm not sure how persuasion-AI is implemented; custom ads could conceivably be banned, but eliminating AI-written media is implausible.

This matters because these AI applications directly affect some places even if they couldn’t be developed in those places.

In the unlikely event that the US moves against not only the deployment but also the development of such systems, AI companies would be more likely to seek a way around regulation — such as relocating.


  1. Rather, I have not seen reasons for this claim other than the very normal one — that leading states and companies change over time. If you have seen more discussion of this claim, please let me know. ↩︎

  2. This is most likely to be relevant to the US but applies generally. ↩︎

Value Is Binary

Epistemic status: rough ethical and empirical heuristic.

Assuming that value is roughly linear in resources available after we reach technological maturity,[1] my probability distribution of value is so bimodal that it is nearly binary. In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else.[2] To the extent that almost all of the probability mass fits into two buckets, and everything within a bucket is almost equally valuable as everything else in that bucket, the goal maximize expected value reduces to the goal maximize probability of the better bucket.

So rather than thinking about how to maximize expected value, I generally think about maximizing the probability of a great (i.e., near-optimal) future. This goal is easier for me to think about, particularly since I believe that the paths to a great future are rather homogeneous — alike not just in value but in high-level structure. In the rest of this shortform, I explain my belief that the future is likely to be near-optimal or near-zero.

 

Substantial probability to near-optimal futures.

I have substantial credence that the future is at least 99% as good as the optimal future.[3] I do not claim much certainty about what the optimal future looks like — my baseline assumption is that it involves increasing and improving consciousness in the universe, but I have little idea whether that would look like many very small minds or a few very big minds. Or perhaps the optimal future involves astronomical-scale acausal trade. Or perhaps future advances in ethics, decision theory, or physics will have unforeseeable implications for how a technologically mature civilization can do good.

But uniting almost all of my probability mass for near-optimal futures is how we get there, at a high level: we create superintelligence, achieve technological maturity, solve ethics, and then optimize. Without knowing what this looks like in detail, I assign substantial probability to the proposition that humanity successfully completes this process. And I think almost all futures in which we do complete this process look very similar: they have nearly identical technology, reach the same conclusions on ethics, have nearly identical resources available to them (mostly depending on how long it took them to reach maturity), and so produce nearly identical value.

 

Almost all of the remaining probability to near-zero futures.

This claim is bolder, I think. Even if it seems reasonable to expect a substantial fraction of possible futures to converge to near-optimal, it may seem odd to expect almost all of the rest to be near-zero. But I find it difficult to imagine any other futures.

For a future to not be near-zero, it must involve using a nontrivial fraction of the resources available in the optimal future (by my assumption that value is roughly linear in resources). More significantly, the future must involve using resources at a nontrivial fraction of the efficiency of their use in the optimal future. This seems unlikely to happen by accident. In particular, I claim:

If a future does not involve optimizing for the good, value is almost certainly near-zero.

Roughly, this holds if all (nontrivially efficient) ways of promoting the good are not efficient ways of optimizing for anything else that we might optimize for. I strongly intuit that this is true; I expect that as technology improves, efficiently producing a unit of something will produce very little of almost all other things (where "thing" includes not just stuff but also minds, qualia, etc.).[4] If so, then value (or disvalue) is (in expectation) a negligible side effect of optimization for other things. And I cannot reasonably imagine a future optimized for disvalue, so I think almost all non-near-optimal futures are near-zero.

 

So I believe that either we optimize for value and get a near-optimal future, or we do anything else and get a near-zero future.

Intuitively, it seems possible to optimize for more than one value. I think such scenarios are unlikely. Even if our utility function has multiple linear terms, unless there is some surprisingly good way to achieve them simultaneously, we optimize by pursuing one of them near-exclusively.[5] Optimizing a utility function that looks more like min(x,y) may be a plausible result of a grand bargain, but such a scenario requires that, after we have mature technology, multiple agents have nontrivial bargaining power and different values. I find this unlikely; I expect singleton-like scenarios and that powerful agents will either all converge to the same preferences or all have near-zero-value preferences.

 

I mostly see "value is binary" as a heuristic for reframing problems. It also has implications for what we should do: to the extent that value is binary (and to the extent that doing so is feasible), we should focus on increasing the probability of great futures. If a "catastrophic" future is one in which we realize no more than a small fraction of our value, then a great future is simply one which is not catastrophic and we should focus on avoiding catastrophes. But of course, "value is binary" is an empirical approximation rather than an a priori truth. Even if value seems very nearly binary, we should not reject contrary proposed interventions[6] or possible futures out of hand.

I would appreciate suggestions on how to make these ideas more formal or precise (in addition to comments on what I got wrong or left out, of course). Also, this shortform relies on argument by "I struggle to imagine"; if you can imagine something I cannot, please explain your scenario and I will justify my skepticism or update.


  1. You would reject this if you believed that astronomical-scale goods are not astronomically better than Earth-scale goods or if you believed that some plausible Earth-scale bad would be worse than astronomical-scale goods are good. ↩︎

  2. "Optimal" value is roughly defined as the expected value of the future in which we act as well as possible, from our current limited knowledge about what "acting well" looks like. "Zero" is roughly defined as any future in which we fail to do anything astronomically significant. I consider value relative to the optimal future, ignoring uncertainty about how good the optimal future is — we should theoretically act as if we're in a universe with high variance in value between different possibilities, but I don't see how this affects what we should choose before reaching technological maturity.*
    *Except roughly that we should act with unrealistically low probability that we are in a kind of simulation in which our choices matter very little or have very differently-valued consequences than otherwise. The prospect of such simulations might undermine my conclusions—value might still be binary, but for the wrong reason—so it is useful to be able to almost-ignore such possibilities. ↩︎

  3. That is, at least 99% of the way from the zero-value future to the optimal future. ↩︎

  4. If we particularly believe that value is fragile, we have an additional reason to expect this orthogonality. But I claim that different goals tend to be orthogonal at high levels of technology independent of value's fragility. ↩︎

  5. This assumes that all goods are substitutes in production, which I expect to be nearly true with mature technology. ↩︎

  6. That is, those that affect the probability of futures outside the binary or that affect how good the future is within the set of near-zero (or near-optimal) futures out of hand. ↩︎

After reading the first paragraph of your above comment only, I want to note that:

In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else.

I assign much lower probability to near-optimal futures than near-zero-value futures.

This is mainly because I imagine a lot of the "extremely good" possible worlds I imagine when reading Bostrom's Letter from Utopia are <1% of what is optimal.

I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.

(I'd like to read the rest of your comment later (but not right now due to time constraints) to see if it changes my view.)

I agree that near-optimal is unlikely. But I would be quite surprised by 1%-99% futures because (in short) I think we do better if we optimize for good and do worse if we don’t. If our final use of our cosmic endowment isn’t near-optimal, I think we failed to optimize for good and would be surprised if it’s >1%.

Agreed with this given how many orders of magnitude potential values span.

Rescinding my previous statement:

> I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.

I'd now say that probably the probability of 1%-99% optimal futures is <10% of the probability of >99% optimal futures.

This is because 1% optimal is very close to being optimal (only 2 orders of magnitude away out of dozens of orders of magnitude of very good futures).

Related idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome.

Binaries Are Analytically Valuable

Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have

  • really solved alignment (i.e., we can create an aligned AI capable of triggering an intelligence explosion at reasonable extra cost and delay) or
  • really not solved alignment (i.e., we cannot create a similarly powerful aligned AI, or doing so would require very unreasonable extra cost and delay)

(Whether this is actually true is irrelevant to my point.)

Why would this matter?

Stating the risk from an unaligned intelligence explosion is kind of awkward: it's that the alignment tax is greater than what the leading AI project is able/willing to pay. Equivalently, our goal is for the alignment tax to be less than what the leading AI project is able/willing to pay. This gives rise to two nice, clean desiderata:

  • Decrease the alignment tax
  • Increase what the leading AI project is able/willing to pay for alignment

But unfortunately, we can't similarly split the goal (or risk) into two goals (or risks). For example, a breakdown into the following two goals does not capture the risk from an unaligned intelligence explosion:

  • Make the alignment tax less than 6 months and a trillion dollars
  • Make the leading AI project able/willing to spend 6 months and a trillion dollars on aligning an AI

It would suffice to achieve both of these goals, but doing so is not necessary. If we fail to reduce the alignment tax this far, we can compensate by doing better on the willingness-to-pay front, and vice versa.

But if alignment success is binary, then we actually can decompose the goal (bolded above) into two necessary (and jointly sufficient) conditions:

  • Really solve alignment; i.e., reduce the alignment tax to [reasonable value]
  • Make the leading AI project able/willing to spend [reasonable value] on alignment

(Where [reasonable value] depends on what exactly our binary-ish probability distribution for alignment success looks like.)

Breaking big goals down into smaller goals—in particular, into smaller necessary conditions—is valuable, analytically and pragmatically. Binaries help, when they exist. Sometimes weaker conditions on the probability distribution, those of the form a certain important subset of possibilities has very low probability, can be useful in the same way.