I have previously written that I don't think Formal Proof is a good roadmap for AI alignment.

However, this framing seems unreasonably strong.

So here is a list of 20 problems related to alignment that a godlike proof oracle would be useful for.

For the purpose of this exercise, assume that you have an oracle that takes as input a math problem stated with about the level of rigor of a Math Olympiad problem, converts it into a formal specification, creates a formal proof, and then generates a human readable summary of that proof.

Notice that if you have such an oracle you can also ask it questions of the form "find X such that X has property Y"

 1. Formal specification and proof of a quantum-resistant encryption scheme

Being able to encrypt messages knowing that no AI in the future will be able to read them is useful e.g. for avoiding Roko's Basilisk concerns

2. Defining Programs whose effects are limited in space and time

Suppose you have a superhuman AI and you don't want to unbox it, but you do want to use it.  For example, you might ask it to build a nanobot that converts a certain amount of matter into diamond but not all matter.

3. Solve the alignment problem

Certain specifications of the alignment problem are computationally intractable but easy to specify.

for example

An AI is aligned, if after turning it on, a simulation of all living humans giving infinite time and resources to study the world created by the AI without interference would agree that it is aligned

4. Acquire 6 Million dollars and donate it to pro-alignment causes

Duh.

5. Determine whether the many-worlds hypothesis is true

...by solving the Grand Unified Theory in Physics.

If Many Worlds is false (e.g. because ER=EPR or Objective Waveform collapse is true), then certain alignment strategies (e.g. bargaining with other timelines) are no longer feasible.

6. Prove the existence of free-will (or not)

...by proving one of

  1. no-cloning implies that an agent's actions cannot be knowing in advance
  2. irreducible complexity implies an agent cannot be predicted without simulating it

7. Create a text-to-image model that provably does not reproduce any of the images in its training set

Seems useful to make people like this shut up.

8. Human Whole Brain Emulation

Take a person, put them in a VR environment, record their actions and then solve the problem:

find a neural network of with minimum complexity+error which predicts these actions

9. Proof (or disprove) the existence of God

Suppose a God of the Gaps exists.  Or that God answers prayer.  With a perfect model of physics this should be statistically detectable in terms of violations of natural law.

10. Prove that the Chinchilla Scaling Laws are correct (or not)

The question of whether or not this is the optimal text-prediction accuracy for a given amount of compute and data.

11. Prove a version of the Natural Abstraction Hypothesis is true (or false)

This hypothesis is currently lies on my "most likely path" for AI Alignment.

12. Create a chatbot that is provably "helpful, harmless and honest"

People seem that this would be useful. Or maybe not.

13. Prove whether the universe is deterministic (or not)

Suppose the universe is deterministic (and started from a low-entropy initialization point) then it should be possible to record a few hours of static anywhere in the universe and then find a low-Kolmogorov-complexity algorithm that outputs that static.

14. Prove the Collatz Conjecture

Per Erdos, "Mathematics may not be ready for such problems", which implies a proof would yield substantial new breakthroughs in math.  Some of these may turn out to be useful for Alignment

15. Execute the Long Reflection

Given an emulation of a human brain, determine how they would solve the Alignment Problem if given 1m years to think about the problem.

16. Solve the Moral Mazes Problem

Determine what social structure optimally avoids the principal agent problems arising in large organizations.

17. TPU Design

Given a formal design language for computer chips, find the TPU Design which optimally achieves a goal (e.g GPT-3 accuracy at next character prediction).

Note: even if you don't plan to use such a TPU (because acceleration bad) knowing what it is seems important. Is it 10x faster 10,000x ?

18. Solve the Diamond Alignment Problem

Should be easier than the Alignment Problem to formally specify.

19. Stock Market Arbitrage

Find the stock trading algorithm that statistically maximizes returns when back tested against financial data.

20. Generate More questions for this List

Generate the lowest kolmogorov complexity program which outputs this list and ask it what question 21 would be.

New Comment
13 comments, sorted by Click to highlight new comments since: Today at 7:49 AM

Many of these do not seem to be easily specified true/false statements in formal mathematical logic. I don’t think I can prove a theorem which allows me to arbitrage the market, unless there’s some derivatives contract tied to the existence of a mathematical proof without first knowing what the theorem is I’m trying to prove. Even then, finding such a theorem would be the hard part.

More centrally: many have been working on a formal specification of the diamond alignment problem for years. That is not to say that you couldn’t do it, but an AI that can accomplish such a task would not be a mere theorem prover.

Can't tell if this is a job offer or an argument of the form "I tried this and failed therefore it is impossible"

It’s neither. I’m claiming you need more than just a theorem prover’s mental skills to accomplish the task. If you think you can formulate the specification in formal mathematical logic, so that you do only need a theorem prover to accomplish the task (and also a list of possible programs), then go ahead. I’d be excited if you succeeded.

Not bad as a list of problems to solve, but this isn't what people normally mean by a theorem prover. In the usual concept, the input to the theorem prover is already formalized. You need something like a higher-powered version of Wolfram's proposed coupling between ChatGPT and Wolfram Alpha

I broadly agree with the gears to ascension's complaint. The title and first paragraph of the post were pretty appealing (I'd love to read a post exploring how a superhuman theorem prover could be usefully applied to alignment research), but to that end, the post was quite disappointing.

For the purpose of this exercise, assume that you have an oracle that takes as input a math problem stated with about the level of rigor of a Math Olympiad problem, converts it into a formal specification, creates a formal proof, and then generates a human readable summary of that proof.

Is a highly non-trivial assumption for most of the use cases you highlighted. Some of them I don't expect to be practically formalisable without a strongly superhuman general intelligence.

Some I'm not sure can be rigorously formalised by physically feasible systems.

This was not intended to be a practical list.  It was intended to refute the claim "mathematical provers... would not help at all"

I estimate only like 2-3 of these are "practical" with systems that will be built before the singularity.

But not 0 of them.

And if you have a boxed super-intelligent theorem-proving AGI I would guess most of them are within reach.

Perhaps I should write a 2nd "list of practical things I might do with a Proof Oracle".  But given that I'm at -9 karma for this post, not really sure it's worth it.

solid try, but only #12 seems plausible by my (intuitive partially trained) model of proof systems and ai, and to do it, you would have to formalize what you mean by helpful, harmless, and honest. You'd probably have to use formalizations that aren't as nice as you'd like; if you want to investigate it, it's the only one of these that even vaguely seems actually approachable as a real research plan. I've strong downvoted because so many of these are misleadingly useless; yudkowsky not knowing about it probably in my view because the ideas that work are rare and he's stuck on trying to get the best ones and can't improve from what we've got today. Your #12 could use what we've got today, if interested, here are some possible starting places. You'll have to be much better than me at formal reasoning to actually improve SOTA, all I'm good at is finding stuff, I'm librarian character class and only mediocre at it.

https://www.lesswrong.com/posts/ozojWweCsa3o32RLZ/list-of-links-formal-methods-embedded-agency-3d-world-models#Formal_Training 

Interesting that you thought #12 is the easiest to formalize.  I would have guessed #14 personally.

An AI is aligned, if after turning it on, a simulation of all living humans giving infinite time and resources to study the world created by the AI without interference would agree that it is aligned

I am not convinced that definition is adequate. See Siren worlds.

Perhaps you think the "infinite time and resources" bit overcomes that? Again, I'm not convinced. Humans have finite memory capacity. There are limits to how much they can learn, even with immortality and infinite time. If you artificially enhance their minds too much, the resulting creatures are no longer human, and may not share our values.

I think Siren worlds are fairly uncommon and only arise under intense optimization pressure so any decent regularization scheme should make them irrelevant.

Formal specification and proof of a quantum-resistant encryption scheme

I've got one of those. Absolutely unbreakable even with infinite computing power. It's simple, really.

Being able to encrypt messages knowing that no AI in the future will be able to read them is useful e.g. for avoiding Roko's Basilisk concerns

If the AI can upload your brain, why can't it simply read the memory of whoever you were talking to? Encryption doesn't seem to help here.

The whole point of the encryption would be to not leave enough information behind that AI can upload your brain.

And I assume you're smart enough to know why one-time-pad is not a solution here.