Summary I’ve been working on a project aimed at finding goal representations in a small RL agent. I designed a setup where I train an agent, rapidly alternating between two very similar objectives. I was able to consistently (across alternations and across different random seeds) get high average returns on...
Note: I wrote this document over a year ago and have decided to post it with minimal edits; it isn't entirely up to date with my current thinking on the subject. Imagine you enter a room that looks like this: Despite never having visited this room, you can make inferences...
Note: I wrote this document over a year ago, and recently I decided to post it with minimal edits. If I were to do research in the same area today, I'd probably have different framings, thoughts on what directions seem promising, etc. Some of the things I say here now...
Disclaimer: Nate gave me some life advice at EA Global; I thought it was pretty good, but it may or may not be useful for other people. If you think any of this would be actively harmful for you to apply, you probably shouldn't. Notice subtle things in yourself This...
Background (specific information will be sparse here. This is meant to give context for the Takeaways section of the post) Our group (Garrett, Chu, and Johannes) have worked with John Wentworth in the SERI MATS 2 Electric Boogaloo program for three weeks, meaning it's time for a Review & Takeaways...