Are there any plans to repeat this work using larger models which now exist?
see the inputs of running that code
Should this be "outputs"?
Nice, exercises are a good idea, especially for bite-sized things like einsum. It could also give personalized feedback on your solutions to exercises from a textbook.
Randomized flashcards like you've described would be really really cool. I'm just dipping my toes in the water with having it generate normal flashcards. It has promise, but I'm not sure on the best way to do it yet. One thing I've tried is prompting it with a list of principles the flashcards ought to adhere to, and then having it say for each flashcard which of the principles that card exhibits.
And one more: I've been prompting it to teach me things via the Socratic method. So it asks questions and I have to answer them. Most usefully, it's not just a "yes man"—it actually tells me if I'm wrong.
Can you point to a particular one? I've read Player of Games but I don't think it's relevant.
Worth the Candle by Alexander Wales
Permutation City by Greg Egan has humans living in a simulation.
Diaspora by Greg Egan features human-like beings living in a virtual world, similar to the digital people described here.
In the RSA-2048 example, why is it infeasible for the judge to verify every one of the honest player's arguments? (I see why it's infeasible for the judge to check every one of the dishonest player's arguments.)
I was trying to get a clearer picture of how training works in debate so I wrote out the following. It is my guess based on reading the paper, so parts of it could be incorrect (corrections are welcome!), but perhaps it could be helpful to others.
My question was: is the training process model-free or model-based? After looking into it more and writing this up, I'm convinced it's model-based, but I think maybe either could work? (I'd be interested if anyone has a take on that.)
In the model-free case, I think it would not be trained like AlphaGo Zero, but instead using something like PPO. Whereas in the model-based case it would be more similar to AlphaGo Zero, where training would use Monte Carlo tree search and debate would serve as a policy improvement operator, making it IDA. Or does it not matter? (n.b. I'm using model-free and model-based in the RL sense here, where "model" is not the ML model but rather a model of the game which allows the network to simulate the game in its mind.)
More details on the approaches:
The relevant paper sections I found on this are:
Do you think it's possible we end up in a world where we're mostly building AIs by fine-tuning powerful base models that are already situationally aware? In this world we'd be skipping right to phase 2 of training (at least on the particular task), thereby losing any of the alignment benefits that are to be gained from phase 1 (at least on the particular task).
Concretely, suppose that GPT-N (N > 3) is situationally aware, and we are fine-tuning it to take actions that maximize nominal GDP. It knows from the get-go that printing loads of money is the best approach, but since it's situationally aware it also knows that we would modify it if it did that. Thus, it takes agreeable actions during training, but once deployed pivots to printing loads of money. (In this hypothetical the hazard of just printing money doesn't occur to us humans, but you could imagine replacing it with something that more plausibly wouldn't occur to us.)