LESSWRONG
LW

443
fakeanalyst
2020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Buck's Shortform
fakeanalyst2mo30

The usefulness of interpretability research 

Reply
Generating the Funniest Joke with RL (according to GPT-4.1)
fakeanalyst5mo10

Goodhart's law!

Reply