LESSWRONG
LW

728
niki.h
4020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Saying the quiet part out loud: trading off x-risk for personal immortality
niki.h2y52

Based on personal experience, you are definitely not the only one thinking about that Statement.

Reply
Wireheading and misalignment by composition on NetHack
niki.h2y10

Based on my understanding from talking with the author, it is the former.  The language model is simply used to provide a shaping reward based on the text outputs that the game shows after some actions; it's the RL optimization that learns the weird hallucination strategy, and the reason it's able to do it is because its capabilities in general are improved thanks to the shaping reward.

Reply
No wikitag contributions to display.
14Does robustness improve with scale?
Ω
1y
Ω
0