Announcing the Farlamp project
I'm studying the impact of overseer failure on RL-based IDA, because I want to know under what conditions the amplification increases or decreases the failure rate, in order to help my reader understand whether we need to combine capability amplification with explicit reliability amplification in all cases.
In this project I will:
- Take the implementation of iterated distillation and amplification from Christiano et al.'s ‘Supervising strong learners by amplifying weak experts’ and adapt it to reinforcement learning. (It is using supervised learning now.)
- Introduce overseer failures and see how they influence the overall failure rate.
- Write a paper about the results.
Overseer failures in SupAmp and ReAmp contains a more extensive introduction, as well as an explanation of the relevant terms, concepts etc.
The project repo contains all the public artifacts I have produced so far.
At the moment I'm expanding my ML skills using the book Hands-On Machine Learning with Scikit-Learn & TensorFlow. After this I will start working on the IDA code.
Paul Christiano has been funding this project, giving me a chance to try my hand at research (again). The next evaluation will be in December/January. Depending on the progress, the funding and therewith the project can be discontinued.
I'm also Looking for remote writing partners.