We used Claude Sonnet 4 for the agents and narration, and Claude 3.5 Sonnet for most of the evaluation.
We haven't made any specific plans yet on how to measure alignment; our first goal was to check if there were observable differences at all, before making those differences properly measurable.
We used Claude Sonnet 4 for the agents and narration, and Claude 3.5 Sonnet for most of the evaluation.
We haven't made any specific plans yet on how to measure alignment; our first goal was to check if there were observable differences at all, before making those differences properly measurable.