Open letters (related to AI safety):
FLI, Oct 2015: Research Priorities for Robust and Beneficial Artificial Intelligence
FLI, Aug 2017: Asilomar AI Principles
FLI, Mar 2023: Pause Giant AI Experiments
CAIS, May 2023: Statement on AI Risk
Meta, Jul 2023: Statement of Support for Meta’s Open Approach to Today’s AI
Academic AI researchers, Oct 2023: Managing AI Risks in an Era of Rapid Progress
CHAI et al., Oct 2023: Prominent AI Scientists from China and the West Propose Joint Strategy to Mitigate Risks from AI
Oct 2023: Urging an International AI Treaty
Mozilla, Oct 2023: Joint Statement on AI Safety and Openness
Nov 2023: Post-Summit Civil Society Communique
Joint declarations between countries (related to AI safety):
Nov 2023: Bletchley Declaration
Thanks to Peter Barnett.
Mozilla, Oct 2023: Joint Statement on AI Safety and Openness (pro-openness, anti-regulation)
This shortform was going to be a post in Slowing AI but its tone is off.
This shortform is very non-exhaustive.
There is a real consideration here. Reasonable variants of this take include
(Set aside the fact that slowing the West generally also slows China, because they're correlated and because ideas pass from the West to China.) (Set aside the question of whether China will try to slow and how correlated that is with the West slowing.)
In some cases slowing the West would be worth burning lead time. But slowing AI doesn't just mean the West slowing itself down. Some interventions would slow both spheres similarly or even differentially slow China– most notably export controls, reducing diffusion of ideas, and improved migration policy.
See West-China relation.
Yes, insofar as slowing now risks speeding later, we should notice that. There is a real consideration here.
But in some cases slowing now would be worth a little speeding later. Moreover, some kinds of slowing don't cause faster progress later at all: for example, reducing diffusion of ideas, decreasing hardware progress, and any stable and enforceable policy regimes that slow AI.
See Quickly scaling up compute.
(Set aside the question of how much powerful AI helps alignment research.) If powerful AI is important for alignment research, that means we should aim to increase time with powerful AI, not how soon powerful AI appears.
No, running AI systems seems likely to be cheap and there's already plenty of compute.
Some bad default/attractor properties of cause-focused groups of humans:
(I'm interested in readings on this topic.)
Propositions on SIA
Epistemic status: exploring implications, some of which feel wrong.
Which of them feel wrong to you? I agree with all them other than 3b, which I'm unsure about - I think it this comment does a good job at unpacking things.
2a is Katja Grace's Doomsday argument. I think 2aii and 2aiii depends on whether we're allowing simulations; if faster expansion speed (either the cosmic speed limit or engineering limit on expansion) meant more ancestor simulations then this could cancel out the fact that faster expanding civilizations prevent more alien civilizations coming in to existence.
I deeply sympathize with the presumptuous philosopher but 1a feels weird.
2a was meant to be conditional on non-simulation.
Actually putting numbers on 2a (I have a post on this coming soon), the anthropic update seems to say (conditional on non-simulation) there's almost certainly lots of aliens all of which are quiet, which feels really surprising.
To clarify what I meant on 3b: maybe "you live in a simulation" can explain why the universe looks old better than "uh, I guess all of the aliens were quiet" can.
I deeply sympathize with the presumptuous philosopher but 1a feels weird.
Yep! I have the same intuition
Actually putting numbers on 2a (I have a post on this coming soon), the anthropic update seems to say (conditional on non-simulation) there's almost certainly lots of aliens all of which are quiet, which feels really surprising.
Nice! I look forward to seeing this. I did similar analysis - both considering SIA + no simulations and SIA + simulations in my work on grabby aliens
In board games, the endgame is a period generally characterized by strategic clarity, more direct calculation of the consequences of actions rather than evaluating possible actions with heuristics, and maybe a narrower space of actions that players consider.
Relevant actors, particularly AI labs, are likely to experience increased strategic clarity before the end, including AI risk awareness, awareness of who the leading labs are, roughly how much lead time they have, and how threshold-y being slightly ahead is.
There may be opportunities for coordination in the endgame that were much less incentivized earlier, and pausing progress may be incentivized in the endgame despite being disincentivized earlier (from an actor's imperfect perspective and/or from a maximally informed/wise/etc. perspective).
The downside of "AI endgame" as a conceptual handle is that it suggests thinking of actors as opponents/adversaries. Probably "crunch time" is better, but people often use that to gesture at hinge-y-ness rather than strategic clarity.
For example, if a lab is considering deploying a powerful model, it can prosocially show its hand--i.e., demonstrate that it has a powerful model--and ask others to make themselves partially transparent too. This affordance doesn't appear until the endgame. I think a refined version of it could be a big deal.
That could be, but also maybe there won't be a period of increased strategic clarity. Especially if the emergence of new capabilities with scale remains unpredictable, or if progress depends on finding new insights.
I can't think of many games that don't have an endgame. These examples don't seem that fun:
Agree. I merely assert that we should be aware of and plan for the possibility of increased strategic clarity, risk awareness, etc. (and planning includes unpacking "etc.").
Probably taking the analogy too far, but: most games-that-can-have-endgames also have instances that don't have endgames; e.g. games of chess often end in the midgame.
Mostly for personal use. Likely to be expanded over time.
Some prompts inspired by Framing AI strategy:
For forecasting prompts, see List of uncertainties about the future of AI.
Some miscellaneous prompts:
Four kinds of actors/processes/decisions are directly very important to AI governance:
Related: How technical safety standards could promote TAI safety.
("Safety standards" sounds prosaic but it doesn't have to be.)
AI risk decomposition based on agency or powerseeking or adversarial optimization or something
Epistemic status: confused.
Some vague, closely related ways to decompose AI risk into two kinds of risk:
The central reason to worry about powerseeking/whatever AI, I think, is that sufficiently (relatively) capable goal-directed systems instrumentally converge to disempowering you.
The central reason to worry about non-powerseeking/whatever AI, I think, is failure to generalize correctly from training-- distribution shift, Goodhart, You get what you measure.
What's the relationship between the propositions "one AI lab [has / will have] a big lead" and "the alignment tax will be paid"? (Or: in a possible world where lead size is bigger/smaller, how does this affect whether the alignment tax is paid?)
It depends on the source of the lead, so "lead size" or "lead time" is probably not a good node for AI forecasting/strategy.
Miscellaneous observations:
(Note also the distinction between current lead and ultimate lead. Roughly, the former is what we can observe and the latter is what we care about.)
(If paying the alignment tax looks less like a thing that happens for one transformative model and more like something that occurs gradually in a slow takeoff to avert Paul-style doom, things are more complex and in particular there are endogeneities such that labs may have additional incentives to pursue capabilities.)
there should be no alignment tax because improved alignment should always pay for itself, right? but currently "aligned" seems to be defined by "tries to not do anything", institutionally. Why isn't anthropic publicly competing on alignment with openai? eg folks are about to publicly replicate chatgpt, looks like.
List of uncertainties about the future of AI
This is an unordered list of uncertainties about the future of AI, trying to be comprehensive– trying to include everything reasonably decision-relevant and important/tractable.
This list is mostly written from a forecasting perspective. A useful complementary perspective to forecasting would be strategy or affordances or what actors can do and what they should do. This list is also written from a nontechnical perspective.
Maybe this list would be more useful if it had more pointers to relevant work?
Maybe this list would be more useful if it included stuff that's important that I don't feel uncertain about? But probably not much of that exists?
I like lists/trees/graphs. I like the ideas behind Clarifying some key hypotheses in AI alignment and Modelling Transformative AI Risks. Perhaps this list is part of the beginning of a tree/graph for AI forecasting not including alignment stuff.
Meta level. To carve nature at its joints, we must [use good nodes / identify the true nodes]. A node is [good insofar as / true if] its causes and effects are modular, or we can losslessly compress phenomena related to it into effects on it and effects from it.
"The cost of compute" is an example of a great node (in the context of the future of AI): it's affected by various things (choices made by Nvidia, innovation, etc.), and it affects various things (capability-level of systems made by OpenAI, relative importance of money vs talent at AI labs, etc.), and we lose nothing by thinking in terms of the cost of compute (relative to, e.g., the effects of the choices made by Nvidia on the capability-level of systems made by OpenAI).
"When Moore's law will end" is an example of something that is not a node (in the context of the future of AI), since you'd be much better off thinking in terms of the underlying causes and effects.
The relations relevant to nodes are analytical not causal. For example, "the cost of compute" is a node between "evidence about historical progress" and "timelines," not just between "stuff Nvidia does" and "stuff OpenAI does." (You could also make a causal model, but here I'm interested in analytical models.)
Object level. I'm not sure how good "timelines," "takeoff," "polarity," and "wakeup to capabilities" are as nodes. Most of the time it seems fine to talk about e.g. "effects on timelines" and "implications of timelines." But maybe this conceals confusion.
Biological bounds on requirements for human-level AI
Facts about biology bound requirements for human-level AI. In particular, here are two prima facie bounds:
(A human-level AI should use similar computation as humans per subjective time. This assumption/observation is weird and perhaps shows that something weird is going on, but I don't know how to make that sharp.)
There are few sources of bounds on requirements for human-level AI. Perhaps fundamental limits or reasoning about blind search could give weak bounds, but biology is the only example of human-level cognitive abilities and so the only possible source of reasonable bounds.
Related: Ajeya Cotra's Forecasting TAI with biological anchors (most relevant section) and Eliezer Yudkowsky's Biology-Inspired AGI Timelines: The Trick That Never Works.
What do people (outside this community) think about AI? What will they think in the future?
Attitudes predictably affect relevant actors' actions, so this is a moderately important question. And it's rather neglected.
Groups whose attitudes are likely to be important include ML researchers, policymakers, and the public.
On attitudes among the public, surveys provide some information, but I suspect attitudes will change (in potentially predictable ways) as AI becomes more salient and some memes/framings get locked in. Perhaps some survey questions (maybe general sentiment on AI) are somewhat robust to changes in memes while others (maybe beliefs on how AI affects the economy or attitudes on regulation) may change a lot in the near future.
On attitudes among ML researchers, surveys (e.g.) provide some information, but for some reason most ML researchers say there's at least a 5% probability of doom (or 10%, depending on how you ask) but this doesn't seem to translate into their actions or culture. Perhaps interviews would reveal researchers' attitudes better than closed-ended surveys (note to self: talk to Vael Gates).
AI may become much more salient in the next few years, and memes/framings may get locked in.
On attitudes among ML researchers, surveys (e.g.) provide some information, but for some reason most ML researchers say there's at least a 5% probability of doom (or 10%, depending on how you ask) but this doesn't seem to translate into their actions or culture. Perhaps interviews would reveal researchers' attitudes better than closed-ended surveys (note to self: talk to Vael Gates).
Critically, this only is necessary if we assume that researchers care about basically everyone in the present (to a loose approximation.) If we instead model researchers as basically selfish by default, then the low chance of a technological singularity outweighs the high chance of death, especially for older folks.
Basically, this could be explained as a goal alignment problem: LW and AI Researchers have very different goals in mind.
I'm interested in the claim important AI development (in the next few decades) will largely occur outside any of the states that currently look likely to lead AI development. I don't think this is likely, but I haven't seen discussion of this claim.[1] This would matter because it would greatly affect the environment in which AI is developed and affect which agents are empowered by powerful AI.
Epistemic status: brainstorm. May be developed into a full post if I learn or think more.
The big tech companies are in the US and China, and discussion often assumes that these two states have a large lead on AI development. So how could important development occur in another state? Perhaps other states' tech programs (private or governmental) will grow. But more likely, I think, an already-strong company leaves the US for a new location.
My legal knowledge is insufficient to say how well companies can leave their states with any confidence. My impression is that large American companies largely can leave while large Chinese companies cannot.
Why might a big tech company or AI lab want to leave a state?[2]
The state in which powerful AI is developed has two important effects.
Eliezer recently tweeted that AI might be low-impact until superintelligence because of constraints on deployment. This seems partially right — for example, medicine and education seem like areas in which marginal improvements in our capabilities have only small effects due to civilizational inadequacy. Certainly some AI systems would require local regulatory approval to be useful; those might well be limited in the US. But a large fraction of AI systems won't be prohibited by plausible American regulation. For example, I would be quite surprised if the following kinds of systems were prohibited by regulation (disclaimer: I'm very non-expert on near-future AI):
This matters because these AI applications directly affect some places even if they couldn’t be developed in those places.
In the unlikely event that the US moves against not only the deployment but also the development of such systems, AI companies would be more likely to seek a way around regulation — such as relocating.
Epistemic status: rough ethical and empirical heuristic.
Assuming that value is roughly linear in resources available after we reach technological maturity,[1] my probability distribution of value is so bimodal that it is nearly binary. In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else.[2] To the extent that almost all of the probability mass fits into two buckets, and everything within a bucket is almost equally valuable as everything else in that bucket, the goal maximize expected value reduces to the goal maximize probability of the better bucket.
So rather than thinking about how to maximize expected value, I generally think about maximizing the probability of a great (i.e., near-optimal) future. This goal is easier for me to think about, particularly since I believe that the paths to a great future are rather homogeneous — alike not just in value but in high-level structure. In the rest of this shortform, I explain my belief that the future is likely to be near-optimal or near-zero.
I have substantial credence that the future is at least 99% as good as the optimal future.[3] I do not claim much certainty about what the optimal future looks like — my baseline assumption is that it involves increasing and improving consciousness in the universe, but I have little idea whether that would look like many very small minds or a few very big minds. Or perhaps the optimal future involves astronomical-scale acausal trade. Or perhaps future advances in ethics, decision theory, or physics will have unforeseeable implications for how a technologically mature civilization can do good.
But uniting almost all of my probability mass for near-optimal futures is how we get there, at a high level: we create superintelligence, achieve technological maturity, solve ethics, and then optimize. Without knowing what this looks like in detail, I assign substantial probability to the proposition that humanity successfully completes this process. And I think almost all futures in which we do complete this process look very similar: they have nearly identical technology, reach the same conclusions on ethics, have nearly identical resources available to them (mostly depending on how long it took them to reach maturity), and so produce nearly identical value.
This claim is bolder, I think. Even if it seems reasonable to expect a substantial fraction of possible futures to converge to near-optimal, it may seem odd to expect almost all of the rest to be near-zero. But I find it difficult to imagine any other futures.
For a future to not be near-zero, it must involve using a nontrivial fraction of the resources available in the optimal future (by my assumption that value is roughly linear in resources). More significantly, the future must involve using resources at a nontrivial fraction of the efficiency of their use in the optimal future. This seems unlikely to happen by accident. In particular, I claim:
If a future does not involve optimizing for the good, value is almost certainly near-zero.
Roughly, this holds if all (nontrivially efficient) ways of promoting the good are not efficient ways of optimizing for anything else that we might optimize for. I strongly intuit that this is true; I expect that as technology improves, efficiently producing a unit of something will produce very little of almost all other things (where "thing" includes not just stuff but also minds, qualia, etc.).[4] If so, then value (or disvalue) is (in expectation) a negligible side effect of optimization for other things. And I cannot reasonably imagine a future optimized for disvalue, so I think almost all non-near-optimal futures are near-zero.
So I believe that either we optimize for value and get a near-optimal future, or we do anything else and get a near-zero future.
Intuitively, it seems possible to optimize for more than one value. I think such scenarios are unlikely. Even if our utility function has multiple linear terms, unless there is some surprisingly good way to achieve them simultaneously, we optimize by pursuing one of them near-exclusively.[5] Optimizing a utility function that looks more like min(x,y) may be a plausible result of a grand bargain, but such a scenario requires that, after we have mature technology, multiple agents have nontrivial bargaining power and different values. I find this unlikely; I expect singleton-like scenarios and that powerful agents will either all converge to the same preferences or all have near-zero-value preferences.
I mostly see "value is binary" as a heuristic for reframing problems. It also has implications for what we should do: to the extent that value is binary (and to the extent that doing so is feasible), we should focus on increasing the probability of great futures. If a "catastrophic" future is one in which we realize no more than a small fraction of our value, then a great future is simply one which is not catastrophic and we should focus on avoiding catastrophes. But of course, "value is binary" is an empirical approximation rather than an a priori truth. Even if value seems very nearly binary, we should not reject contrary proposed interventions[6] or possible futures out of hand.
I would appreciate suggestions on how to make these ideas more formal or precise (in addition to comments on what I got wrong or left out, of course). Also, this shortform relies on argument by "I struggle to imagine"; if you can imagine something I cannot, please explain your scenario and I will justify my skepticism or update.
You would reject this if you believed that astronomical-scale goods are not astronomically better than Earth-scale goods or if you believed that some plausible Earth-scale bad would be worse than astronomical-scale goods are good. ↩︎
"Optimal" value is roughly defined as the expected value of the future in which we act as well as possible, from our current limited knowledge about what "acting well" looks like. "Zero" is roughly defined as any future in which we fail to do anything astronomically significant. I consider value relative to the optimal future, ignoring uncertainty about how good the optimal future is — we should theoretically act as if we're in a universe with high variance in value between different possibilities, but I don't see how this affects what we should choose before reaching technological maturity.*
*Except roughly that we should act with unrealistically low probability that we are in a kind of simulation in which our choices matter very little or have very differently-valued consequences than otherwise. The prospect of such simulations might undermine my conclusions—value might still be binary, but for the wrong reason—so it is useful to be able to almost-ignore such possibilities. ↩︎
That is, at least 99% of the way from the zero-value future to the optimal future. ↩︎
If we particularly believe that value is fragile, we have an additional reason to expect this orthogonality. But I claim that different goals tend to be orthogonal at high levels of technology independent of value's fragility. ↩︎
This assumes that all goods are substitutes in production, which I expect to be nearly true with mature technology. ↩︎
That is, those that affect the probability of futures outside the binary or that affect how good the future is within the set of near-zero (or near-optimal) futures out of hand. ↩︎
After reading the first paragraph of your above comment only, I want to note that:
In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else.
I assign much lower probability to near-optimal futures than near-zero-value futures.
This is mainly because I imagine a lot of the "extremely good" possible worlds I imagine when reading Bostrom's Letter from Utopia are <1% of what is optimal.
I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.
(I'd like to read the rest of your comment later (but not right now due to time constraints) to see if it changes my view.)
I agree that near-optimal is unlikely. But I would be quite surprised by 1%-99% futures because (in short) I think we do better if we optimize for good and do worse if we don’t. If our final use of our cosmic endowment isn’t near-optimal, I think we failed to optimize for good and would be surprised if it’s >1%.
Agreed with this given how many orders of magnitude potential values span.
Rescinding my previous statement:
> I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.
I'd now say that probably the probability of 1%-99% optimal futures is <10% of the probability of >99% optimal futures.
This is because 1% optimal is very close to being optimal (only 2 orders of magnitude away out of dozens of orders of magnitude of very good futures).
Related idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome.
Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have
(Whether this is actually true is irrelevant to my point.)
Why would this matter?
Stating the risk from an unaligned intelligence explosion is kind of awkward: it's that the alignment tax is greater than what the leading AI project is able/willing to pay. Equivalently, our goal is for the alignment tax to be less than what the leading AI project is able/willing to pay. This gives rise to two nice, clean desiderata:
But unfortunately, we can't similarly split the goal (or risk) into two goals (or risks). For example, a breakdown into the following two goals does not capture the risk from an unaligned intelligence explosion:
It would suffice to achieve both of these goals, but doing so is not necessary. If we fail to reduce the alignment tax this far, we can compensate by doing better on the willingness-to-pay front, and vice versa.
But if alignment success is binary, then we actually can decompose the goal (bolded above) into two necessary (and jointly sufficient) conditions:
(Where [reasonable value] depends on what exactly our binary-ish probability distribution for alignment success looks like.)
Breaking big goals down into smaller goals—in particular, into smaller necessary conditions—is valuable, analytically and pragmatically. Binaries help, when they exist. Sometimes weaker conditions on the probability distribution, those of the form a certain important subset of possibilities has very low probability, can be useful in the same way.
Meta level. To carve nature at its joints, we must [use good nodes / identify the true nodes]. A node is [good insofar as / true if] its causes and effects are modular, or we can losslessly compress phenomena related to it into effects on it and effects from it.
"The cost of compute" is an example of a great node (in the context of the future of AI): it's affected by various things (choices made by Nvidia, innovation, etc.), and it affects various things (capability-level of systems made by OpenAI, relative importance of money vs talent at AI labs, etc.), and we lose nothing by thinking in terms of the cost of compute (relative to, e.g., the effects of the choices made by Nvidia on the capability-level of systems made by OpenAI).
"When Moore's law will end" is an example of something that is not a node (in the context of the future of AI), since you'd be much better off thinking in terms of the underlying causes and effects.
The relations relevant to nodes are analytical not causal. For example, "the cost of compute" is a node between "evidence about historical progress" and "timelines," not just between "stuff Nvidia does" and "stuff OpenAI does." (You could also make a causal model, but here I'm interested in analytical models.)
Object level. I'm not sure how good "timelines," "takeoff," "polarity," and "wakeup to capabilities" are as nodes. Most of the time it seems fine to talk about e.g. "effects on timelines" and "implications of timelines." But maybe this conceals confusion.