LESSWRONG
LW

SciHamster
3120
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
SciHamster6mo32

fwiw, the fact that somebody can just finetune the model, is already indicative of a serious problem
 

Reply1
Building a Bugs List prompts
SciHamster3y10

Thanks! This post was really thought-provoking.

Ideas for other prompts that may be helpful:
- Say out loud or to yourself: "In my life everything is okay". Notice the emotions, objections and memories that come to mind.
- How do I see myself if I continue living like I do now? What do I love about the picture? What things would I want to add?

Reply
1An argument on animal consciousness (soliciting criticism)
3y
2