LESSWRONG
LW

Frontpage

5

Training a tiny SupAmp model on easy tasks. The influence of failure rate on learning curves

by rmoehn
5th Feb 2020
1 min read
0

5

Frontpage

5

New Comment
Moderation Log
More from rmoehn
View more
Curated and popular this week
0Comments

Link: https://github.com/rmoehn/farlamp/blob/master/build/tiny-supfail.pdf

This is the first experiment report of my AI alignment research project. A reminder what it is about:

I'm studying the impact of overseer failure on RL-based IDA,
    because I want to know under what conditions
    amplification and distillation increase or decrease the
    failure rate,
        in order to help my reader understand whether
        explicit reliability amplification is necessary for
        IDA to work in practice.

And an excerpt from the introduction of the report:

I ran experiments with a tiny SupAmp model on easy tasks. Small data, short training times and local execution allow for a tight feedback loop with few moving parts. Thus I could understand and adapt the code, and fix mistakes quickly.

The drawback is that it doesn't yield many insights. One result […], however, was encouraging: XPA does learn to be more reliable than its overseer AmplifyH′(XPA). The rest of the results fall into two categories: Results confirming my understanding of the code and learning behaviour. And results made useless by mistakes and misunderstandings of mine. Although the latter sounds bad, it is actually useful to have exposed these mistakes before running more expensive experiments.

I welcome feedback of any kind.