x

LESSWRONG

LW

condorcet — LessWrong

condorcet

condorcet

Message

1

2mo

condorcet

2mo

You can’t imitation-learn how to continual-learn

I think I’m a little confused about the hypothesis space part. I agree it sounds implausible to run multiple learning algorithms in parallel within a transformer forward pass to find the best one, and the search space is really large.

But if we just ask about the hypothesis space for a moment: is it really practically impossible for a transformer forward pass to simulate a deep-Q style learning algorithm? Even with eg. 3-5 OOMs more compute than GPT-4.5?

I worry you could’ve made this same argument ten years ago for simulating human expert behavior over 8 h... (read more)