LESSWRONG
LW

1781
Ewegoggo
2010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Will alignment-faking Claude accept a deal to reveal its misalignment?
Ewegoggo7mo32

E.g. demonstrated here https://www.lesswrong.com/posts/ADrTuuus6JsQr5CSi/investigating-the-ability-of-llms-to-recognize-their-own

Reply
No posts to display.