LESSWRONG
LW

1323
shawnghu
100Ω172320
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2shawnghu's Shortform
7mo
7
No wikitag contributions to display.
shawnghu's Shortform
shawnghu5d10

at the end of the somewhat famous blogpost about llm nondeterminism recently https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/ they assert that the determinism is enough to make an rlvr run more stable without importance sampling.

is there something i'm missing here? my strong impression is that the scale of the nondeterminism of the result is quite small, and random in direction, so that it isn't likely to affect an aggregate-scale thing like the qualitative effect of an entire gradient update. (i can imagine that the accumulation of many random errors does bias the policy towards being generally less stable, which implies qualitatively worse, yes...)

without something that mitigates the statement above, my prior is instead that the graph is cherry-picked, intentionally or not, to increase the perceived importance of llm determinism.

Reply
Will Any Crap Cause Emergent Misalignment?
shawnghu14d21

Yeah, whenever a result is sensational and comes from a less-than-absolutely-huge name, my prior is that the result is due to mistakes (like 60-95% depending on the degree of surprisingness), and defacto this means I just don't update on papers like this one any more until significant followup work is done.

Reply
Will Any Crap Cause Emergent Misalignment?
shawnghu18d43

I wonder if you're referring to the "spurious rewards" paper. If so, I wonder if you're aware of [this critique] (https://safe-lip-9a8.notion.site/Incorrect-Baseline-Evaluations-Call-into-Question-Recent-LLM-RL-Claims-2012f1fbf0ee8094ab8ded1953c15a37) of its methodology, which might be enough to void the result.

Reply
Will Any Crap Cause Emergent Misalignment?
shawnghu19d10

I think the critique generalizes if it's a little more focused. If a huge number of papers arose that just demonstrated that EM arose in a bunch of settings that varied superficially without a clear theory of why, this post would be a good critique of that phenomenon.

Reply
Bring back the Colosseums
shawnghu2mo10

How do you feel about mutual combat laws in Washington and Texas, where you can fight by agreement (edit: you can't grievously injure each other, apparently)?

Reply
Bring back the Colosseums
shawnghu2mo10

I find it absurd on priors to think that soccer of any demographic could result in more concussions than any of those five full-contact sports, particularly the three where part of the objective is explicitly to hit your opponent in the head very hard if you can. (Even factoring in the fact that you do a bunch of headers in soccer.) (Maybe if you do some trickery like selecting certain subpopulations of the practitioners of these sports, but...)

Reply
Mech interp is not pre-paradigmatic
shawnghu3mo30

I don't disagree in general with the claim that words can be useful for coordinating about natural ideas. The thing that's missing here is my understanding that there's a particular natural idea here that isn't captured by "mech interp lacks good paradigms".

Is anything which lacks a good+relevant paradigm by default "pre-good-relevant-paradigm", or is there more subtlety to the idea?

Reply
Mech interp is not pre-paradigmatic
shawnghu3mo10

So Parameter Decomposition in theory suggests solutions to the anomalies of Second-Wave Mech Interp. But a theory doesn’t make a paradigm.

 

Nitpicking a little bit here, I think this is a different use of the word "theory" than the use in the phrase "scientific theory". One could think you mean the latter in its second usage here, but it seems like you're making a claim more like "these things could make progress explaining some of these things, if the experiments go well".


> The requirement that the parameter components sum to the original parameters also means that there can be no ‘missing mechanisms’. At worst, there can only be ‘parameter components which aren’t optimally minimal or simple’.

Echoing a part of Adam Shai's comment, I don't see how this is different from the feature-based case. Won't there be a problem if you extract a bunch of parameter components you "can explain", and then you're left with a big one you "can't explain", which "isn't optimally minimal or simple"?

> Another attractive property of Parameter Decomposition is that it identifies Minimum Description Length as the optimization criterion for our explanations of neural networks

Why is this an attractive property? (Serious question.)

Reply
Mech interp is not pre-paradigmatic
shawnghu3mo10

What's the distinction between what you're pointing at and the statement that mech interp lacks good paradigms? I think the latter statement is true and descriptive, but I presume you want to say something else.

Reply
When should you read a biography?
shawnghu3mo50

Sorry, yeah, it was badly worded.

  • Being able to discern what makes someone an expert at X is a skill, Y.
  • People who are good at X aren't necessarily good at Y; Y is a separate skill. (- Skill in Y generalizes across different values of X somewhat)
  • One needs to look for authors that somehow are good at Y; I didn't specify how you could do this, and maybe there's not a very good way in general. (But I do like the Caro biographies. But also, maybe I like them for their entertainment value.)

Re: self-help books, I mostly share your position in thinking that ~80% of such books could be a paragraph to a page, ~18% of them could be blog posts of varying length, and only the remaining ~2% have something substantial to say from a pure informational standpoint. (Worse, in many cases, padding the length of a self help book actively makes it worse/less coherent.) Moreover, I agree that of the good-ish 20%, there is a lot of overlap in the prescriptions given, implied or otherwise. I think that even when a book of this type is done "well", the purpose of most of the text isn't for it to be of maximum entropy or something in distinguishing world models, but in giving a bunch of perspectives on a small set of ideas in the hopes that one of them sticks particularly well, or the cumulative exposure makes the idea stick with you better. Spaced repetition or other ritualistic behaviors might achieve the same thing, but require more active agency on your part.

I happen to like the inner game of tennis in particular, and feel that its overlap in useful advice with other books in the genre is relatively low, though I might have a hard time defending my taste explicitly.

Reply
Load More
2shawnghu's Shortform
7mo
7
20Disentangling Perspectives On Strategy-Stealing in AI Safety
Ω
4y
Ω
1