12 Angry Agents, or: A Plan for AI Empathy

Davidmanheim

We propose that the most important property of pain is this: When living beings feel pain, their performance degrades.

If this was generally true, then wouldn't we have expected evolution to optimize it out of us? Sure, performance degrades in many of the things we consciously care about, but that seems more like a trade-off with increased performance at protecting ourself from the immediate situation.

[-]Ram Rachum2mo3-1

Good observation. I would bet that the trade-off you suggest is indeed the original reason, however, our social structures have evolved around our pain, physical and emotional. One way to put it is that degraded performance is a bug on the individual level but a feature on the social level. It would be difficult to have a friend who doesn't have insecurities.

[-]StanislavKrym2mo20

Pain in supposed to be a signal that the animal is in a bad state. Evolution drove animals to avoiding pain, not to feeling no pain.

[-]Davidmanheim2mo30

Yes - and a critical point here is that if you're learning only by positive example, you won't learn as well.

Focus on a reward function is pointing towards what to do, which as previous posts point out, have lots of failure modes, and not as much learning what not to do, which is a critical issue.

^{^}

By “LLM” here, we mean a deployed model under standard decoding without an external reward loop. Modern systems are pretrained with maximum-likelihood and often further optimized with RL methods: RLHF and, increasingly, RL without human feedback in verifiable domains (for example, code with unit tests or game environments). Those RL stages shape behavior, but at runtime the model does not compute or maximize a reward unless you explicitly add one (such as a reward model, best-of-N selection, or search).

^{^}

In some cases pain can cause us to perform better. A person lifting weights in the gym may find that they're lifting more easily when they're angry, and a theater actor may find that pain allows them to make a more interesting performance.

LESSWRONG
LW

LESSWRONG
LW

21

12 Angry Agents, or: A Plan for AI Empathy

21

21

12 Angry Men: A Recap

Conflicts in 12AM

Conflicts and Empathy

Empathy as Resonance

Resonance between people and within people

What could make AI agents "juror-like"?

Designing our jury room

Pain as performance degradation

From pain to empathy to alignment