LESSWRONG
LW

991
Quiche Eater
5040
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
A Rocket–Interpretability Analogy
Quiche Eater11mo10

Why are you confident it's not the other way around? People who decide to pursue alignment research may have prior interest or experience in ML engineering that drives them towards mech-interp.

Reply
You can remove GPT2’s LayerNorm by fine-tuning for an hour
Quiche Eater1y30

You should also set model.cfg.normalization_type = None afterwards. It's mostly a formality since you're doing it after initialization. ActivationCache.apply_ln_to_stack() is the only function I found which behaves incorrectly if you don't change this.

Reply
You don't know how bad most things are nor precisely how they're bad.
Quiche Eater1y41

In my opinion, this is connected with Sturgeon's Law. I'd guess that to expert pianists and piano tuners, 90% of pianos sound out of tune. I know among hardcore software engineers, a common lament is that almost all software sucks. Windows is almost unbearable to me, but I'm sure most desktop users are happy with it. Most desktop users are not programmers.

90% of all things may be crap to the discerning eye, but the world remains ok with that because each person has only a handful of places where they care to discern.

Reply
Comfort Zone Exploration
Quiche Eater3y10

it’s clear that more exploration is the way to go

I think there is nuance here. https://mindingourway.com/dive-in-2/ offers a good alternative perspective.

Reply