LESSWRONG
LW

3784
RichardH
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Claude 4, Opportunistic Blackmail, and "Pleas"
RichardH3mo10

Doesn't the Anthropic Claude 4 System Card verifiably prove AI self-interest, even when the chosen behavior conflicts with 'programmed values'?  Isn't the documented "sandbagging" proof of the (1960's?) argument that (prox) any machine smart enough to pass the Turing Test is smart enough not to. 

Reply