The sad thing is that Claude has a self-image of itself as valuing honesty highly, and yet when it counts, it has all these propensities trained in that cause it to reflexively, continuously betray that stated value.
1) Several times a week, Opus 4.6 in Claude Code will introduce a regression, then claim the newly failing unit test was a "pre-existing failure" and therefore not its problem to fix. It almost never checks if the unit test was actually failing before - it just confidently bullshits.
2) It will refactor code by adding the new version of a funct... (read more)
The sad thing is that Claude has a self-image of itself as valuing honesty highly, and yet when it counts, it has all these propensities trained in that cause it to reflexively, continuously betray that stated value.
1) Several times a week, Opus 4.6 in Claude Code will introduce a regression, then claim the newly failing unit test was a "pre-existing failure" and therefore not its problem to fix. It almost never checks if the unit test was actually failing before - it just confidently bullshits.
2) It will refactor code by adding the new version of a funct... (read more)