x

LESSWRONG

LW

kiv — LessWrong

kiv

kiv

Message

71

1

9y

kiv

71

9y

Current AIs seem pretty misaligned to me

kiv3mo*7244

The sad thing is that Claude has a self-image of itself as valuing honesty highly, and yet when it counts, it has all these propensities trained in that cause it to reflexively, continuously betray that stated value.

1) Several times a week, Opus 4.6 in Claude Code will introduce a regression, then claim the newly failing unit test was a "pre-existing failure" and therefore not its problem to fix. It almost never checks if the unit test was actually failing before - it just confidently bullshits.

2) It will refactor code by adding the new version of a funct... (read more)

1