x

LESSWRONG

LW

lukasgebhard — LessWrong

lukasgebhard

lukasgebhard

Message

15

1

1

9mo

lukasgebhard

15

9mo

An Empirical Review of the Animal Harm Benchmark

Summary: The Animal Harm Benchmark (AHB) is one of only two publicly available benchmarks for measuring LLM bias against non-human animals. This work examines whether AHB 2.0 is well-calibrated, asking three questions: (Q1) Does a score of 0 correspond to maximum risk and 1 to minimum risk? (Q2) Do higher...