Eliezer Yudkowsky on Math AIs

Here are some interesting quotes from the alignment debate with Richard Ngo.

If it were possible to perform some pivotal act that saved the world with an AI that just made progress on proving mathematical theorems, without, eg, needing to explain those theorems to humans, I'd be extremely interested in that as a potential pivotal act. We wouldn't be out of the woods, and I wouldn't actually know how to build an AI like that without killing everybody, but it would immediately trump everything else as the obvious line of research to pursue.


If there were one theorem only mildly far out of human reach, like proving the ABC Conjecture (if you think it hasn't already been proven), and providing a machine-readable proof of this theorem would immediately save the world - say, aliens will give us an aligned superintelligence, as soon as we provide them with this machine-readable proof - then there would exist a plausible though not certain road to saving the world, which would be to try to build a shallow mind that proved the ABC Conjecture by memorizing tons of relatively shallow patterns for mathematical proofs learned through self-play; without that system ever abstracting math as deeply as humans do, but the sheer width of memory and sheer depth of search sufficing to do the job. I am not sure, to be clear, that this would work. But my model of intelligence does not rule it out.

Here is a caveat and further explanation:

If you knew about the things that humans are using to reuse their reasoning about chipped handaxes and other humans, to prove math theorems, you would see it as more plausible that proving math theorems would generalize to chipping handaxes and manipulating humans.


Human math is very much about goals. People want to prove subtheorems on the way to proving theorems. We might be able to make a different kind of mathematician that works more like GPT-3 in the dangerously inscrutable parts that are all noninspectable vectors of floating-point numbers, but even there you'd need some Alpha-Zero-like outer framework to supply the direction of search.

That outer framework might be able to be powerful enough without being reflective, though. So it would plausibly be much easier to build a mathematician that was capable of superhuman formal theorem-proving but not agentic. The reality of the world might tell us "lolnope" but my model of intelligence doesn't mandate that. That's why, if you gave me a pivotal act composed entirely of "output a machine-readable proof of this theorem and the world is saved", I would pivot there! It actually does seem like it would be a lot easier!

Should we pivot there?

In my previous AI-in-a-box success model, I argued that we may be able to get close to a pivotal act with a myopic Oracle AI.

Here is the backbone of my plan (better described here):

  1. Use the Oracle AI to make a very large amount of money (e.g. $100B+).
  2. Use the money and also technology coming from the Oracle AI to power an international surveillance organization focused on dangerous AGI research.
  3. Convince those doing dangerous AGI research to stop, and supporting the enforcement of international bans on such research.

I called it a stupid plan to highlight that we can make the plan a lot better if we come together and think a lot harder about it.

However, the plan does seem to generalize somewhat to the case in which we have a Math AI (that generates machine-readable proofs) instead of a more general myopic Oracle.

To make things clear, I expect that for this to work we'll need a Math AI that can also answer and that has strong capabilities on computer science domains, as long as tasks are formally defined.

Making money with a Math AI

There seems to be many possible ways of using such a Math AI to make large amounts of money, either in computer science, in finance, in mechanism design, nuclear energy, theoretical chemistry, materials science, you name it!

Almost no one seems to be looking for these possible uses because:

  • doing so requires a deep understanding both of mathematics and of the area in which you want to use it for
  • there is little current practical gain to be had from these insights given that we do not have a Math AI available
  • those capable of making such connections often don't have easy access to resources and wouldn't even be able to immediately exploit the idea once a Math AI becomes available

However, we may be able to coordinate as a community so that we do start thinking about that, and so that, once Math AI becomes available, desired proofs and financial resources become available to several members of our community, who can then exploit several different areas simultaneously.

In some areas (like finance) profits may come quicker and may help fund the other initiatives.

Intelligence gathering with a Math AI

I argued previously that creating a private or multinational intelligence agency focused on dangerous AI research may be easier than it looks like.

Traditional intelligence agencies certainly are well-funded, but they do not have budgets as large as some people expect. For instance, the NSA has annual budget of 10 billion, while the CIA has an annual budget of 15 billion. A single billionaire near the top of the Forbes list would be able to fund these organizations by himself for many years.

So it is seems possible to obtain funding for a large intelligence organization dedicated to identifying and preventing dangerous AGI research. And if we can succeed at that, we may be able to make it a lot more efficient than the equivalent government agencies. 

Here are some problems government intelligence agencies face:

  • Entrenched bureaucracies.
  • Influence by political considerations, in ways that sometimes conflict with the organization's stated purposes.
  • Priority creep (issues don't return to lower priorities once the they have become less urgent).
  • Ad hoc priorities associated with unexpected international events.
  • Intense lobbying efforts by contractors, resulting in ineffective and disconnected collection systems.
  • Leadership specialized in political rather than technical skills.

A new intelligence agency focused on AGI existential risks will have none of these problems, and therefore I consider it possible for it to be comparatively much more effective, even in the case it has a smaller budget.

Notwithstanding these advantages, the new intelligence agency still has a Math AI!

The first and most obvious use of the AI is attempt an attack on the most widely used forms of cryptography.

By careful formal definitions, it may be possible to reduce some computer science problems to logical considerations, and allow the AI, for example, to also generate a few zero-day exploits.

Stopping dangerous AGI research

I'll freely admit that I spent much less time thinking about this last part. We most likely don't want to enforce a ban on dangerous research unilaterally.

We want to build legitimacy for a multinational or international ban on such research, so we should get international organizations on board.

Even better, we should become the international organization responsible for dealing with this problem, trying hard to obtain the same recognition that such organizations have, even as we maintain a decision structure that is de facto not very much influenced by any governments.

Once the moral authority of the bans on dangerous AGI research is more accepted, there is going to be less and less pushback to enforcement. At this point, friendly enforcement agencies will be able to stop dangerous research, even outside their respective jurisdictions in some cases. We may be able reward them politically and technologically.

At some point most problematic developments will be related either to projects from major state-level entities, or to technological development in available processing power. 

For the former, the entities that need to be managed are very few in number, so it is conceivable that they can be dissuaded by political or diplomatic means.

The latter is more serious. Unless we do something, lone geniuses will eventually be able to do dangerous AGI research on their garages.

Success here depends on being able to find an agreement to maintain the status quo regarding computational resources. This will be a lot easier to do once a large multinational organization is already recognized as the major moral authority on this area.

It is also important to make sure those that need/want computational resources for purposes other than dangerous AGI research can do so easily in a way that is secure. For instance, it may be possible to require certain advanced dual-use hardware to be distributed exclusively to cloud providers that are heavily monitored.


New Comment
4 comments, sorted by Click to highlight new comments since: Today at 12:41 AM

That does not look like a plan to me, it looks like two! One to make a lot of money and one to save the world with a lot of money. And lot of smart people look for plans to make a lot of money, lot of those same people have "throw an AI at it" as an hypothesis in their toolkit and are technically minded, competent in STEM, so it does not seem like an interesting direction to look at.

I do agree that it would be interesting to have a plan on how to effectively use a lot of money if we get it in whatever way but I would be quite surprised if "whatever way" ends up being "the community earns it" rather than "we convince some ultrarich like Elon Musk or some governments to give us a lot of money".

I think you are right! Maybe I should have actually written different posts about each of these two plans.

And yes, I agree with you that maybe the most likely way of doing what I propose is getting someone ultra rich to back it. That idea has the advantage that it can be done immediately, without waiting for a Math AI to be available.

To me it still seems important to think of what kind of strategical advantages we can obtain with a Math AI. Maybe it is possible to gain a lot more than money (I gave the example of zero-day exploits, but we can most likely get a lot of other valuable technology as well).

Be careful about someone else stealing/copying/plagiarizing your oracle.

If you're making 100B+, someone will try to do so, with very high probability. Likely many people. That's a level where I can easily see someone e.g. burning zero-days.

In my model the Oracle would stay securely held in something like a Faraday cage with no internet connection and so on.

So yes, some people might want to steal it, but if we have some security I think they would be unlikely to succeed, unless it is a state-level effort.