Ronak_Mehta — LessWrong

ML PhD, working on automating alignment research. Trying to be better about "just sending it".

I think this is roughly right. I think of it more as a single layer would be a permutation, and that composing these permutations would give your complex behaviors (that break down in these nice ways). As a starting point having the hidden/model dimension equal to the input and output dimension would allow some sort of "reasonable" first interpretation that you are using convex combinations of your discrete vocabulary to compose the behaviors and come up with a prediction for your output. Then intermediate layers can map directly to your vocab space (this won't by default be true though, you'd still need some sort of diagonalized prior or something to make it such that each basis corresponded to an input vocab token).

Do you have a good estimate of what is and what will be possible with massively scaled up inference-time compute over the next 3 months? 6 months? Are you thinking about how this will effect others' priorities? Resource allocation? Governance and policy?

IMO having good answers to these questions feels super important for prioritizing where you spend your time.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments