I think I’m a little confused about the hypothesis space part. I agree it sounds implausible to run multiple learning algorithms in parallel within a transformer forward pass to find the best one, and the search space is really large.
But if we just ask about the hypothesis space for a moment: is it really practically impossible for a transformer forward pass to simulate a deep-Q style learning algorithm? Even with eg. 3-5 OOMs more compute than GPT-4.5?
I worry you could’ve made this same argument ten years ago for simulating human expert behavior over 8 h... (read more)
I think I’m a little confused about the hypothesis space part. I agree it sounds implausible to run multiple learning algorithms in parallel within a transformer forward pass to find the best one, and the search space is really large.
But if we just ask about the hypothesis space for a moment: is it really practically impossible for a transformer forward pass to simulate a deep-Q style learning algorithm? Even with eg. 3-5 OOMs more compute than GPT-4.5?
I worry you could’ve made this same argument ten years ago for simulating human expert behavior over 8 h... (read more)