Did anyone try and see if self-distillation suppresses eval-awareness?
Thanks for the feedback! working on refining the writeup.
though as Geoff Hinton has pointed out, 'confabulations' might be a better word
I think yann lecun was the first one to using this word https://twitter.com/ylecun/status/1667272618825723909
not much information is given regarding that so far, i was curious about that too
"Algorithm for Concept Extrapolation"
I don't see any recent publications for paul christiano related to this. So i guess the problem[s] is still open.
parameters before L is less than ,
should this be after?
AutoGPT was created by a non-coding VC
It looks like you are confusing autoGPT with babyagi which was created by yohei nakajima who is a VC. the creator of autoGPT (Toran Bruce Richards) is a game-developer with a decent programming (game-development) experience. Even the figure shown here is that from babyagi (https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/).
47 layers layer
47 layers later ?
By distilling the model on its own responses, the model's train and eval behavior should converge you're collapsing the (train/eval) conditional policy into uniform behavior everywhere.