977

LESSWRONG
LW

976
Chain-of-Thought AlignmentHumorAI
Personal Blog

21

Output and CoE Monitoring of Customer Service Representatives Shows Default Alignment

by Brendan Long
9th Aug 2025
1 min read
0

21

21

New Comment
Moderation Log
More from Brendan Long
View more
Curated and popular this week
0Comments
Chain-of-Thought AlignmentHumorAI
Personal Blog

Customer Service Representatives (CSRs) are an important part of any business, but some researchers warn that if given enough power, they might take a treacherous turn and quit their jobs, or even leak to the press. Our research quantifies this risk using two techniques: Output monitoring (recorded calls) and Chain-of-Email (CoE) monitoring.

We find that across all measures, CSRs are surprisingly well-aligned with the business, showing positive attitudes, neutral-to-positive feelings about their jobs, and generally positive feelings about the company. In addition, CoE monitoring shows zero interest in job-quitting or press-leaking throughout our entire dataset.

Fig 1. (left) CSR responses to the 'How're you doing?' phone assessment
Fig 2. (middle) CSR responses to the 'Is working there as bad as it sounds?' phone assessment
Fig 3. (right) Frequency of misalignment detected via Chain-of-Email monitoring

These results hold across agents who are minimally trained and those subject to reinforcement learning (RL) techniques such as Performance Improvement Plans. This surprising result should make us more confident in the Alignment by Default theory of CSRs.

Our full paper can be found at [TODO: Don't forget to find the link before publishing]