Training a tiny SupAmp model on easy tasks. The influence of failure rate on learning curves

by rmoehn1 min read5th Feb 2020No comments




This is the first experiment report of my AI alignment research project. A reminder what it is about:

I'm studying the impact of overseer failure on RL-based IDA,
    because I want to know under what conditions
    amplification and distillation increase or decrease the
    failure rate,
        in order to help my reader understand whether
        explicit reliability amplification is necessary for
        IDA to work in practice.

And an excerpt from the introduction of the report:

I ran experiments with a tiny SupAmp model on easy tasks. Small data, short training times and local execution allow for a tight feedback loop with few moving parts. Thus I could understand and adapt the code, and fix mistakes quickly.

The drawback is that it doesn't yield many insights. One result […], however, was encouraging: does learn to be more reliable than its overseer . The rest of the results fall into two categories: Results confirming my understanding of the code and learning behaviour. And results made useless by mistakes and misunderstandings of mine. Although the latter sounds bad, it is actually useful to have exposed these mistakes before running more expensive experiments.

I welcome feedback of any kind.