How does iterated amplification exceed human abilities?

May 03, 2020

Ω12210

Let's ignore computational cost for now, and so consider iterated amplification without distillation, and the initial agent is some particular human. Amplification is also going to be simpler -- it just means letting the agent think twice as long.

For example, $A^{0} (Rohin)$ is a question-answering system that just sends me the question, and returns the answer I give after thinking about it for a day. $A^{t} (Rohin)$ refers to the answers I'd give if I had $2^{t}$ days to think about it.

Rather than talk about "human-level", let's talk about "Issa-level" -- agents need to answer questions as well as you could given a day's time.

Then, $A^{0} (Rohin)$ is super-Issa-level on some tasks (e.g. questions about Berkeley culture) and sub-Issa-level on some tasks (e.g. questions about Wikipedia culture). Why is this? Well, for that example, we have different information. But also, presumably there are differences in what we were good at learning, that would have led to differences even if we had the same information. That's the answer to (2) in this context.

The answer to (3) is that with enough time and effort I could answer questions about Wikipedia culture; it would just take me a lot longer to do so relative to you.

The answer to (1) is "idk, but eventually it's possible". For my specific model, one might hope that $t = 13$ would be an upper bound -- at that point I'd get about as much time to answer the question as you have spent living.

The case with iterated distillation and amplification is basically the same:

1. Idk, but eventually it'll happen. (This does rely on the Factored Cognition hypothesis.)

2. A neural net trained by distillation will probably not replicate our skill on tasks perfectly -- what it becomes good at depends on the architecture, training process, the training data it was given, etc. Perhaps humans are really good at social reasoning because it was strongly selected for by evolution, and we didn't give a correspondingly higher amount of training data for the neural net for these social situations, and so it was subhuman at social reasoning.

3. With enough time / computational budget, the agent can (hopefully) replicate whatever (possibly expensive) explicit chunk of reasoning that underlies human performance (even if it was powered by human intuition). This is the Factored Cognition hypothesis. The addition of the distillation step is an extra confounder, but we hope that it doesn't distort anything too much -- its purpose is to improve speed without affecting anything else (though in practice it will reduce capabilities somewhat).

(I might recommend imagining that the first agent has perfect reasoning ability, except that it is very slow. This means that for any question, the first agent could answer it, given unlimited amounts of time. I wouldn't actually make this claim of IDA, but I think it is instructive for building intuitions.)

[-]riceissa6yΩ110

The addition of the distillation step is an extra confounder, but we hope that it doesn't distort anything too much -- its purpose is to improve speed without affecting anything else (though in practice it will reduce capabilities somewhat).

I think this is the crux of my confusion, so I would appreciate if you could elaborate on this. (Everything else in your answer makes sense to me.) In Evans et al., during the distillation step, the model $M$ learns to solve the difficult tasks directly by using example solutions from the amplification step. But if $M$ c

... (read more)

4Rohin Shah6y

You could do this, but it's expensive. In practice, from the perspective of distillation, there's always a tradeoff between: * Generating better ground truth data (which you can do by amplifying the agent that generates the ground truth data) * Improving the accuracy of the distilled model (which you can do by increasing the amount of data that you train on, and other ML tricks) You could get to an Issa-level model using just the second method for long enough, but it's going to be much more efficient to get to an Issa-level model by alternating the two methods.

4riceissa6y

I'm confused about the tradeoff you're describing. Why is the first bullet point "Generating better ground truth data"? It would make more sense to me if it said instead something like "Generating large amounts of non-ground-truth data". In other words, the thing that amplification seems to be providing is access to more data (even if that data isn't the ground truth that is provided by the original human). Also in the second bullet point, by "increasing the amount of data that you train on" I think you mean increasing the amount of data from the original human (rather than data coming from the amplified system), but I want to confirm. Aside from that, I think my main confusion now is pedagogical (rather than technical). I don't understand why the IDA post and paper don't emphasize the efficiency of training. The post even says "Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects" which makes it sound like the efficiency of training isn't important.

6Rohin Shah6y

By "ground truth" I just mean "the data that the agent is trained on", feel free to just ignore that part of the phrase. But it is important that it is better data. The point of amplification is that Amplify(M) is more competent than M, e.g. it is a better speech writer, it has a higher ELO rating for chess, etc. This is because Amplify(M) is supposed to approximate "M thinking for a longer time". Yes, that's right. Paul's posts often do talk about this, e.g. An unaligned benchmark, and the competitiveness desideratum in Directions and desiderata for AI alignment. I agree though that it's hard to realize this since the posts are quite scattered. I suspect Paul would say that it is plausibly competitive relative to training a system using RL with a fixed reward function (because the additional human-in-the-loop effort could be a small fraction of that, as long as we do semi-supervised RL well). However, maybe we train systems in some completely different way (e.g. GPT-2 style language models), it's very hard to predict right now how IDA would compare to that.

Donald Hobson

May 03, 2020

Ω240

In answer to question 2)

Consider the task "Prove Fermats last theorem". This task is arguably human level task. Humans managed to do it. However it took some very smart humans a long time. Suppose you need 10,000 examples. You probably can't get 10,000 examples of humans solving problems like this. So you train the system on easier problems. (maybe exam questions? ) You now have a system that can solve exam level questions in an instant, but can't prove Fermats last theorem at all. You then train on the problems that can be decomposed into exam level questions in an hour. (ie the problems a reasonably smart human can answer in an hour, given access to this machine. ) Repeat a few more times. If you have mind uploading, and huge amounts of compute (and no ethical concerns) you could skip the imitation step. You would get an exponentially huge number of copies of some uploaded mind(s) arranged in a tree structure, with questions being passed down, and answers being passed back. No single mind in this structure experiences more than 1 subjective hour.

If you picked the median human by mathematical ability, and put them in this setup, I would be rather surprised if they produced a valid proof of Fermats last theorem. (and if they did, I would expect it to be a surprisingly easy proof that everyone had somehow missed. )

There is no way that IDA can compete with unaligned AI while remaining aligned. The question is, what useful things can IDA do?

[-]TurnTrout6yΩ120

There is no way that IDA can compete with unaligned AI while remaining aligned

How do you know that? Do you mean to say, "I really don't think IDA can compete with unaligned AI while remaining aligned"?

[-]Lukas Finnveden6yΩ110

If you picked the median human by mathematical ability, and put them in this setup, I would be rather surprised if they produced a valid proof of Fermats last theorem.

I would too. IDA/HCH doesn't have to work with the median human, though. It's ok to pick an excellent human, who has been trained for being in that situation. Paul has argued that it wouldn't be that surprising if some humans could be arbitrarily competent in an HCH-setup, even if some couldn't.

1Donald Hobson6y

Epistemic status: Intuition dump and blatant speculation Suppose that instead of the median human, you used Euclid in the HCH. (Ancient greek, invented basic geometry) I would still be surprised if he could produce a proof of fermat's last theorem (given a few hours for each H). I would suspect that there are large chunks of modern maths that he would be unable to do. Some areas of modern maths have layers of concepts built on concepts. And in some areas of maths, just reading all the definitions will take up all the time. Assuming that there are large and interesting branches of maths that haven't been explored yet, the same would hold true for modern mathematicians. Of course, it depends how big you make the tree. You could brute force over all possible formal proofs, and then set a copy on checking the validity of each line. But at that point, you have lost all alignment, someone will find their proof is a convincing argument to pass the message up the tree. I feel that it is unlikely that any kind of absolute threshold lies between the median human, and an unusually smart human, given that the gap is small in an absolute sense.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

19

[ Question ]

How does iterated amplification exceed human abilities?

19

Ω 8

19

Ω 8

2 Answers sorted by
top scoring

May 03, 2020

May 03, 2020

19

[ Question ]

How does iterated amplification exceed human abilities?

19

Ω 8

19

Ω 8

2 Answers sorted by top scoring

May 03, 2020

May 03, 2020

2 Answers sorted by
top scoring