Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

EDIT: This idea is unsafe unless you have X and only X optimization. If an early logical optimiser produces a paperclip maximiser, then the paperclip maximiser will play along giving good solutions to most logical optimisation problems, but hiding nasty surprises where we are likely to find them. In other words, you have risks of algorithms that give outputs that are pretty well logically optimized, but are also optimized to harm you somehow. Even if the first algorithm isn't like this, it could stumble upon an algorithm like this in the self improvement stage.


Epistemic status: I think the basic Idea is more likely than not sound. Probably some mistakes. Looking for sanity check.

Black box description

The following is a way to Foom an AI while leaving its utility function and decision theory as blank spaces. You could plug any uncomputable or computationally intractable behavior you might want in, and get an approximation out.

Suppose I was handed a hypercomputer and allowed to run code on it without worrying about mindcrime, then the hypercomputer is removed, allowing me to keep 1Gb of data from the computations. Then I am handed a magic human utility function, as code on a memory stick. This approach would allow me to use the situation to make a FAI.

Example algorithms

Suppose you have a finite set of logical formulas, each of which evaluate to some real number. A logical optimizer is an algorithm that takes those formulas and tries to maximize the value of the formula it outputs.

One such algorithm would be to run a proof search for theorems of the form where is one of the formulas representing a real number, and is an explicitly written rational. After running the proof search for a finite time, you will have finitely many formulas of this form. Pick the one with the largest in it, and return the that is provably bigger than it.

Another algorithm to pick a large is to run a logical inductor to estimate each and then pick the that maximized those estimates.

Suppose the formulas were

1) "3+4"

2) "10 if P=NP else 1"

3) "0 if P=NP else 11"

4) "2*6-3"

When run with a small amount of compute, these algorithms would pick option (4). They are in a state of logical uncertainty about whether P=NP, and act accordingly.

Given vast amounts of compute, they would pick either 2 or 3.

We might choose to implicitly represent a set of propositions in some manner instead of explicitly stating them. This would mean that a Logical Optimizer could optimize without needing to explicitly consider every possible expression. It could use an evolutionary algorithm. It could rule out swaths of propositions based on abstract reasoning.

Self Improvement

Now consider some formally specifiable prior over sets of propositions called . could be a straightforward simplicity based prior, or it could be tuned to focus on propositions of interest.

Suppose are a finite set of programs that take in a set of propositions and output one of them. If a program fails to choose a number from 1 to quickly enough then pick randomly, or 1 or something.

Let be the choice made by the program .

Let be the average value of the proposition chosen by the program , weighted by the prior over sets of propositions .

Now attempting to maximize over all short programs is something that Logical Optimizers are capable of doing. Logical Optimizers are capable of producing other, perhaps more efficient Logical Optimizers in finite time.

Odds and ends

Assuming that you can design a reasonably efficient Logical Optimizer to get things started, and that you can choose a sensible , you could get a FOOM towards a Logical Optimizer of almost maximal efficiency.

Note that Logical Optimizers aren't AI's. They have no concept of empirical uncertenty about an external world. They do not perform Baysian updates. They barely have a utility function. You can't put one in a prisoners dilemma. They only resolve a certain kind of logical uncertainty.

On the other hand, a Logical Optimizer can easily be converted into an AI by defining a prior, a notion of baysian updating, an action space and a utility function.

Just maximize over action in expressions of the form "Starting with Prior and updating it based on evidence , if you take action then your utility will be?"

I suspect that Logical Optimisers are safe, in the sense that you could get one to FOOM on real world hardware, without holomorphic encryption and without disaster.

Logical Optimizers are not Clever fool proof. A clever fool could easily turn one into a paper clip maximizer. Do not put a FOOMed one online.

I suspect that one sensible route to FAI is to FOOM a logical optimizer, and then plug in some uncomputable or otherwise unfeasible definition of friendliness.

New Comment
4 comments, sorted by Click to highlight new comments since:

Isn't there a "telomere shortening" effect, where an optimizer using a strong theory can't prove good behavior of successors using the same theory and will pick a successor with weaker theory? Using logical inductors could help with that, but you'd need to spell it out in detail.

One problem that could cause the searching process to be unsafe is if the prior contained a relatively large measure of malign agents. This could happen if you used the universal prior, per Paul's argument. Such agents could maximize across the propositions you test them on, but do something else once they think they're deployed.

If the prior is full of malign agents, then you are selecting your new logical optimizer based on its ability to correctly answer arbitrary questions (in a certain format) about malign agents. This doesn't seem to be that problematic. If the set of programs being logically optimized over is malign, then you have trouble.

Ah, sorry, I misread the terminology. I agree.