dirk

see also my eaforum at https://forum.effectivealtruism.org/users/dirk

Wiki Contributions

Comments

Sorted by
dirk20

Thanks for sharing! I signed up and IDK if I won anything (winners haven't been announced yet) but it was fun trying to jailbreak the models :)

dirk10

I don't think you can cure other people's shyness, insecurity, and social anxiety by being less nice to them.

dirk20

Certainly more skilled writers are more clear, but if you routinely dismiss unclear texts as meaningless nonsense, you haven't gotten good at reading but rather goodharted your internal metrics.

dirk20

To the skilled reader, human-authored texts are approximately never foggy.

dirk141

I'm confused about SAE feature descriptions. In Anthropic's and Google's demos both, there're a lot of descriptions that seem not to match a naked-eye reading of the top activations. (E.G. "Slurs targeting sexual orientation" also has a number of racial slurs in its top activations; the top activations for "Korean text, Chinese name yunfan, Unicode characters" are almost all the word "fused" in a metal-related context; etc.). I'm not sure if these short names are the automated Claude descriptions or if there are longer more accurate real descriptions somewhere; and if these are the automated descriptions, I'm not sure if there's some reason to think they're more accurate than they look, or if it doesn't matter if they're slightly off, or some third thing?

dirk80

Bioshok3 said in a later tweet that they were in any case mistaken about it being 10k H100s and it was actually 100k H100s: https://x.com/bioshok3/status/1831016098462081256 

dirk42

I don't think having a negative emotion about something is strong evidence someone's opinions weren't drawn from an external source. (For one thing, most people naturally have negative reactions to the breaking of social norms!)

Also, I don't see anywhere in jimrandomh's comment that he made any claims about the thing you're talking about? He was exclusively discussing word choice among people who had negative reactions.

dirk10

I'm interested! I'd probably mostly be comparing it to unaugmented Claude for things like explaining ML topics and turning my post ideas into drafts (I don't expect it to be great at this latter but I'm curious whether having some relevant posts in the context window will elicit higher quality). I also think the low-friction integration might make it useful for clarifying math- or programming-heavy posts, though I'm not sure I'll want this often.

dirk21

Arguably habitica, as a gamified task-manager, is an attempt to do #2 here (by way of directly giving in-game rewards for IRL positive habits)

dirk3-3

I don't think the placement of fault is causally related to whether communication is difficult for him, really. To refer back to the original claim being made,  Adam Scholl said that

My guess is that this seems so stressful mostly because Anthropic’s plan is in fact so hard to defend... [I]t seems unsurprising (and good) that people might sometimes strongly object; if Anthropic had more reassuring things to say, I’m guessing it would feel less stressful to try to reassure them.

I think the amount of stress incurred when doing public communication is nearly orthogonal to these factors, and in particular is, when trying to be as careful about anything as Zac is trying to be about confidentiality, quite high at baseline. I don't think Adam Scholl's assessment arose from a usefully-predictive model, nor one which was likely to reflect the inside view.

Load More