Yeah, whenever a result is sensational and comes from a less-than-absolutely-huge name, my prior is that the result is due to mistakes (like 60-95% depending on the degree of surprisingness), and defacto this means I just don't update on papers like this one any more until significant followup work is done.
I wonder if you're referring to the "spurious rewards" paper. If so, I wonder if you're aware of [this critique] (https://safe-lip-9a8.notion.site/Incorrect-Baseline-Evaluations-Call-into-Question-Recent-LLM-RL-Claims-2012f1fbf0ee8094ab8ded1953c15a37) of its methodology, which might be enough to void the result.
I think the critique generalizes if it's a little more focused. If a huge number of papers arose that just demonstrated that EM arose in a bunch of settings that varied superficially without a clear theory of why, this post would be a good critique of that phenomenon.
How do you feel about mutual combat laws in Washington and Texas, where you can fight by agreement (edit: you can't grievously injure each other, apparently)?
I find it absurd on priors to think that soccer of any demographic could result in more concussions than any of those five full-contact sports, particularly the three where part of the objective is explicitly to hit your opponent in the head very hard if you can. (Even factoring in the fact that you do a bunch of headers in soccer.) (Maybe if you do some trickery like selecting certain subpopulations of the practitioners of these sports, but...)
I don't disagree in general with the claim that words can be useful for coordinating about natural ideas. The thing that's missing here is my understanding that there's a particular natural idea here that isn't captured by "mech interp lacks good paradigms".
Is anything which lacks a good+relevant paradigm by default "pre-good-relevant-paradigm", or is there more subtlety to the idea?
So Parameter Decomposition in theory suggests solutions to the anomalies of Second-Wave Mech Interp. But a theory doesn’t make a paradigm.
Nitpicking a little bit here, I think this is a different use of the word "theory" than the use in the phrase "scientific theory". One could think you mean the latter in its second usage here, but it seems like you're making a claim more like "these things could make progress explaining some of these things, if the experiments go well".
> The requirement that the parameter components sum to the original parameters also means that there can be no ‘missing mechanisms’. At worst, there can only be ‘parameter components which aren’t optimally minimal or simple’.
Echoing a part of Adam Shai's comment, I don't see how this is different from the feature-based case. Won't there be a problem if you extract a bunch of parameter components you "can explain", and then you're left with a big one you "can't explain", which "isn't optimally minimal or simple"?
> Another attractive property of Parameter Decomposition is that it identifies Minimum Description Length as the optimization criterion for our explanations of neural networks
Why is this an attractive property? (Serious question.)
What's the distinction between what you're pointing at and the statement that mech interp lacks good paradigms? I think the latter statement is true and descriptive, but I presume you want to say something else.
Sorry, yeah, it was badly worded.
Re: self-help books, I mostly share your position in thinking that ~80% of such books could be a paragraph to a page, ~18% of them could be blog posts of varying length, and only the remaining ~2% have something substantial to say from a pure informational standpoint. (Worse, in many cases, padding the length of a self help book actively makes it worse/less coherent.) Moreover, I agree that of the good-ish 20%, there is a lot of overlap in the prescriptions given, implied or otherwise. I think that even when a book of this type is done "well", the purpose of most of the text isn't for it to be of maximum entropy or something in distinguishing world models, but in giving a bunch of perspectives on a small set of ideas in the hopes that one of them sticks particularly well, or the cumulative exposure makes the idea stick with you better. Spaced repetition or other ritualistic behaviors might achieve the same thing, but require more active agency on your part.
I happen to like the inner game of tennis in particular, and feel that its overlap in useful advice with other books in the genre is relatively low, though I might have a hard time defending my taste explicitly.
at the end of the somewhat famous blogpost about llm nondeterminism recently https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/ they assert that the determinism is enough to make an rlvr run more stable without importance sampling.
is there something i'm missing here? my strong impression is that the scale of the nondeterminism of the result is quite small, and random in direction, so that it isn't likely to affect an aggregate-scale thing like the qualitative effect of an entire gradient update. (i can imagine that the accumulation of many random errors does bias the policy towards being generally less stable, which implies qualitatively worse, yes...)
without something that mitigates the statement above, my prior is instead that the graph is cherry-picked, intentionally or not, to increase the perceived importance of llm determinism.