[ Question ]

Poll: Which variables are most strategically relevant?

by Daniel Kokotajlo, Noa Nabeshima1 min read22nd Jan 202134 comments


Ω 14

AI TakeoffAI TimelinesAIWorld Optimization
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Which variables are most important for predicting and influencing how AI goes?

Here are some examples:

  • Timelines: “When will crazy AI stuff start to happen?”
  • Alignment tax: “How much more difficult will it be to create an aligned AI vs an unaligned AI when it becomes possible to create powerful AI?
  • Homogeneity: "Will transformative AI systems be trained/created all in the same way?"
  • Unipolar / Multipolar: "Will transformative AI systems be controlled by one organization or many?"
  • Takeoff speeds: "Will takeoff be fast or slow (or hard or soft, etc.)?"

We made this question to crowd-source more entries for our list, along with operationalizations and judgments of relative importance. This is the first step of a larger project.


  1. Answers should be variables that are importantly different from the previous answers. It’s OK if there’s some overlap or correlation. If your variable is too similar to a previous answer, instead of making a new answer, comment on the previous answer with your preferred version. We leave it to your judgment to decide how similar is too similar.
  2. Good operationalizations are important. If you can give one or more as part of your answer, great! If you can’t, don’t let that stop you from answering anyway. If you have a good operationalization for someone’s variable, add your operationalization as a comment to that variable.
  3. Upvote variables that you think are important, and strong-upvote variables that you think are very important. You can also downvote variables that you think are unimportant or overrated.
  4. The relevant sense of importance is importance for predicting and influencing how AI goes. For example, “Will AIs in the long-term future be aligned?” is important in some sense, but not that helpful to think about, so shouldn’t score highly here.


Ω 14

New Answer
Ask Related Question
New Comment

28 Answers

Alignment tax: “How much more difficult will it be to create an aligned AI vs an unaligned AI when it becomes possible to create powerful AI?”

If the alignment tax is low, people have less incentive to build an unaligned AI as they'd prefer to build a system that's trying to do what they want. Then, to increase the probability that our AI trajectory goes well, one could focus on how to reduce the alignment tax.

Unipolar / Multipolar: "Will transformative AI systems be privately controlled by one organization or many?" Questions to consider when making this more precise are: What if all the relevant organizations are within one political bloc, like the USA, or many, like the USA + China + Russia + India? What if the humans are unified into a single faction, but the AIs are divided into multiple camps? What if it's the other way around? Also, for each of these kinds of unipolarity or multipolarity, perhaps the world will transition from one type to another at some point, so the question becomes whether the world is unipolar or multipolar "in the crucial period."

There are implications of the unipolar/multipolar variable for AI governance, but also for technical AI safety (e.g. it's more important to build AI that can do bargaining and game theory well, to the extent that the world will be multipolar).

Huh, I'm surprised this one got downvoted -- I had always thought of it as uncontroversially important. I'd be interested to hear why. Maybe the idea is that we should be reducing the alignment tax rather than organizing to pay it, and reducing the tax works the same way in unipolar and multipolar scenarios? EDIT: OK, now it's been strong-upvoted, lol. I guess my takeaway is that this is a controversial one.

To what extent is AGI bottlenecked on data, compute, insight, or other factors? 

Is there some tradeoff such that e.g. 10xing compute or 20xing data is worth 1 massive insight (scaling hypothesis)?  

Will these tradeoffs reach diminishing returns before or after we get to AGI?

This will not only effect timelines, but also effect in what Wardley Stage we'd expect to see it happen, whether we expect to come from a small or large firm, etc

Risk Awareness: “In the critical period, will it be widely believed by most of the relevant people that AI is a serious existential risk?”

This is closely related to whether or not there are warning shots or fire alarms, but in principle it could happen without any of either.

I would add "will relevant people expect AI to have extreme benefits, such as a significant percentage point reduction in other existential risk or a technological solution to aging"

Relative difficulty of making Tool-AI, Oracle-AI, Agent-like AI : In the critical period, is the technique that produce human-level competence explicitly optimizing a reward (like in Reinforcement Learning) or is it more like GPT-3, simply outputting the most likely sequence of characters and stops there?

Further technological progress with a tool-AI still depends on human-AI collaboration, and hence this could lead to slower take-off, an agent-like AI won't necessarily stop to leave time for the humans to think. 

Value-symmetry: "Will AI systems in the critical period be equally useful for different values?"

This could fail if, for example, we can build AI systems that are very good at optimizing for easy-to-measure values but significantly worse at optimizing for hard to measure values. It might be easy to build a sovereign AI to maximize the profit of a company, but hard to create one that cares about humans and what they want.

Evan Hubinger has some operationalizations of things like this here and  here.

At which stage of Wardley Evolution will we reach AGI?

Right now we are in the "Custom Built" stage, during this stage, building something competitive takes an incredible investment in money and talent, so the playing field is small and it's easier to coordiate.

As we move into the "Product" stage, that's the most dangerous. There's no longer huge R&D costs, so many people can enter the game. This is also when the landscape is the most competitive and players are willing to do the most to get ahead, so they're more likely to "move fast and break things".

Then, as we move into the "Commodity" stage, things get a bit safer again.  The market usually thins out as winners emerge, and since everyone basically has the same features, we wouldn't expect drastic shifts that create AGI.  At this stage, a further question becomes 

Are the companies that win the commodity game safety conscious? Because they have a huge leg up in both influencing and monitoring the further developments of AI.

For any given technology, don't we go first to "custom built" and then "product" and then "commodity?" If so, isn't it guaranteed that we'll reach AGI at "Custom built" stage first?

2Matt Goldenberg1moWhen we say "new technology" in Wardley Mapping we're referring to a fundamentally new idea upon which new things can be built. Only if AGI springs forth as soon as that new idea is created would it be in the custom built stage. It's equally possible that AGI could come from iterating on or making the new idea repeatable/practical/cost effective that AGI could arise. An analogy would be if we were talking about - FHT (Faster than Horse Technology). The exact moment we crossed the barrier of being faster than a horse might have been when a new technology was created, bits it's equally possible that it would be between one model of car and another, with no fundamentally new technology just iterating on the existing technology and making the speed go up through experimentation, better understanding, or the result of being able to manufacture at higher scale.

Deceptive alignment: “In the critical period, will AIs be deceptive?”

Within the framework of Risks from Learned Optimization, this is when a mesa-optimizer has a different objective than the base objective, but instrumentally optimizes the base objective to deceive humans. It can refer more generally to any scenario where an AI system behaves instrumentally one way to deceive humans.

Overall effectiveness of bag-of-tricks safety methods:  Under the assumption that the alignment problem will not be solved in a general principled way, players in the critical period who are at least partially worried about safety will likely resort to a bag-of-tricks of sorts to avoid obvious failure modes (like not giving access to the internet, having fast kill-switches, lots of terms in the value function, and other such patches), the overall effectiveness of this bag-of-tricks determines the maximum level of intelligence that could more-or-less safely be used. A negative effect is inducing a false sense of comfort in the team overseeing the AI, the team's culture around safety is very important here for avoiding negative outcomes from overconfidence in the bag-of-tricks.

Relative difficulty of Technical Alignment work and power-acquisition for an AI: Given a self-improving AI with some take-off speed, does it become able to solve the technical alignment problem before it becomes able to escape whatever confinement is in place to stop it?

A team with safety concerns in possession of an AI in the critical periods might wish to use it to solve the alignment problem, if solving the problem requires a dumber AI than one that is capable of escaping confinement, then the team can try to turn-off the self-improvement loop at the level required for alignment work and bootstrap a safe-AI from there.

How important will scaling relatively simple algorithms be, compared to innovation on the algorithms?

Craziness: "Will the world be weird and crazy in the crucial period?" For example, are lots of important things happening fast, such that it's hard to keep up without AI assistance, is the strategic landscape importantly different from what we expected thanks to new technologies and/or other developments, does the landscape of effective strategies for AI risk reducers look importantly different than it does now...

Coordination Easier/Harder: In the crucial period, will the relevant kind of coordination be easier or harder? For example, perhaps the relevant kind of coordination is coordination to not build AGI until more safety work has been done. This is closely related to the question of whether collective epistemology will have improved or deteriorated.

Homogeneity: "Will transformative AI systems be trained/created all in the same way?"

From Evan’s post:

If there is only one AI, or many copies of the same AI, then you get a very homogenous takeoff, whereas if there are many different AIs trained via very different training regimes, then you get a heterogenous takeoff. Of particular importance is likely to be how homogenous the alignment of these systems is—that is, are deployed AI systems likely to all be equivalently aligned/misaligned, or some aligned and others misaligned?

Timelines: The intuitive definition is "When will crazy AI stuff start to happen?" The best analysis of timelines in the world as far as I know is Ajeya's, which uses the definition of Transformative AI given here: roughly, "When will the first AI be built that is capable of causing a change in the world comparable to the Industrial Revolution or greater?" My own preferred definition of timelines is "When is the first AI-induced potential point of no return?"

Takeoff speeds: "Will takeoff be fast or slow (or hard or soft, etc.)?"

This post gives an excellent overview of the various versions and operationalizations of this variable.

How dependent is the AGI on idiosyncratic hardware? While any algorithm can run on any hardware, in practice every algorithm will run faster and more energy-efficiently on hardware designed specifically for that algorithm. But there's a continuum from "runs perfectly fine on widely-available hardware, with maybe 10% speedup on a custom ASIC" to "runs a trillion times faster on a very specific type of room-sized quantum computer that only one company on earth has figured out how to make".

If your AGI algorithm requires a weird new chip / processor technology to run at a reasonable cost, it makes it less far-fetched (although still pretty far-fetched I think) to hope that governments or other groups could control who is running the AGI algorithm—at least for a couple years until that chip / processor technology is reinvented / stolen / reverse-engineered—even when everyone knows that this AGI algorithm exists and how the algorithm works.

I think this is an interesting and unique variable -- but it seems too predictable to me. In particular, I'd be surprised if custom hardware gives more than a 100x speedup to whatever the relevant transformative AI turns out to be, and in fact I'd be willing to bet the speedup would be less than 10x, compared to the hardware used by other major AI companies. (Obviously it'll be 1000x faster than, say, the CPUs on consumer laptops). Do you disagree? I'd be interested to hear your reasons!

4steve21521moI don't really know. My vague impression is that weird hardware could plausibly make many-orders-of-magnitude difference in energy consumption, but probably less overwhelming of a difference in other respects. Unless there's an overwhelming quantum-computing speedup, but I consider that quite unlikely, like <5%. Again this is based on very little thought or research. Maybe I'd be less surprised by a 100x speedup from GPU/TPU to custom ASIC than a 100x speedup from custom ASIC to photonic / neuromorphic / quantum / whatever. Just on the theory that GPUs are highly parallel, but orders of magnitude less parallel than the brain is, and a custom ASIC could maybe capture a lot of that difference. Maybe, I dunno, I could be wrong. A custom ASIC would not be much of a technological barrier the way weirder processors would be, although it could still be good for a year or two I guess, especially if you have cooperation from all the state-of-the-art fabs in the world...

Conditional on reaching a constraint and the current brand of machine learning becoming commoditized before we reach AI, how safety conscious will the major commodity owners be?

As mentioned in another comment, they have a huge influence on the market.

Something like:

Can we get to AGI without running into constraints?


Do we need radically new concepts/tools to get to AGI?

There will be radically different market dynamics if the thing that eventually becomes AGI is built on commoditized components (which will happen if we need another breakthrough to get AGI). The players who build these commodities (possibly safety conscious companies like OpenAI and Deepmind) will have lots of influence on the market.

public sympathy vs dehumanization? ... Like, people could perceive AI algorithms as they do now (just algorithms), or they could perceive (some) AI algorithms as deserving of rights and sympathies like they and their human friends are. Or other possibilities, I suppose. I think it would depend strongly on the nature of the algorithm, as well as on superficial things like whether there are widely-available AI algorithms with cute faces and charismatic, human-like personalities, and whether the public even knows that the algorithm exists, as well as random things like how the issue gets politicized and whatnot. A related issue is whether the algorithms are actually conscious, capable of suffering, etc., which would presumably feed into public perceptions, as well as (presumably) mattering in its own right.

Short-term economic value: How lucrative will pre-AGI systems be, and how lucrative will investors expect they might be? What size investments do we expect?

Societal robustness: How robust is society to optimization pressure in general? In the absence of recursive improvement, how much value could a mildly superintelligent agent extract from society?

What is the activation energy for an Intelligence Explosion?: What AI capabilities are needed specifically for meaningful recursive self-improvement? Are we likely to hit a single intelligence explosion once that barrier is reached, or will earlier AI systems also produce incomplete explosions, eg. if very lopsided AI can recursively optimize some aspects of cognition, but not enough for generality?

Personability vs Abstractness: How much will the first powerful AI systems take on the traits of humans, versus to what extent will they be idealized, unbiased reasoning algorithms.

If the missing pieces of intelligence come from scaling up ML models trained on human data, we might expect a bias towards humanlike cognition, whereas if the missing pieces of intelligence come from key algorithmic insights, we might expect fewer parallels.

Forewarning: Will things go seriously wrong before they go irreversibly wrong?

Lopsidedness: Does AI risk require solving all the pieces, or does it suffice to have an idiot savant, that exceeds human capabilities in only some axes, while still underperforming in others?

Public understanding of value misalignment: slightly before the critical period, are there high-profile cases of products failing by being misaligned (the proverbial crushed baby caused by wanting to fetch the coffee sooner) ? If the major players are companies, a widespread public understanding of misalignment could motivate the players to carefully spend more on safety (people won't buy the product otherwise). Widespread public understanding can also cause cooperation between the players by lowering the fear of being scooped (everyone knows that you need to somewhat solve alignment to have a viable product, so players don't mind paying the alignment tax so much).

Open / Closed: "Will transformative AI systems in the critical period be publicly available?"

A world where everyone has access to transformative AI systems, for example by being able to rent them (like GPT-3's API once it's publicly available), might be very different from one where they are kept private by one or more private organizations.

For example, if strategy stealing doesn't hold, this could dramatically change the distribution of power, because the systems might be more helpful for some tasks and values than others.

This variable could also affect timelines estimates if publicly accessible TAI systems increase GDP growth, among other effects it could have on the world.