Richard Ren

Message

Introducing MASK: A Benchmark for Measuring Honesty in AI Systems

In collaboration with Scale AI, we are releasing MASK (Model Alignment between Statements and Knowledge), a benchmark with over 1000 scenarios specifically designed to measure AI honesty. As AI systems grow increasingly capable and autonomous, measuring the propensity of AIs to lie to humans is increasingly important. Often, LLM developers...

Mar 5, 202537

The Bitter Lesson for AI Safety Research

Read the associated paper "Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?": https://arxiv.org/abs/2407.21792 Focus on safety problems that aren’t solved with scale. Benchmarks are crucial in ML to operationalize the properties we want models to have (knowledge, reasoning, ethics, calibration, truthfulness, etc.). They act as a criterion to judge...

Aug 2, 202458

LESSWRONG
LW

LESSWRONG
LW

Richard Ren

Richard Ren

Richard Ren

Richard Ren

Introducing MASK: A Benchmark for Measuring Honesty in AI Systems

The Bitter Lesson for AI Safety Research

Introducing MASK: A Benchmark for Measuring Honesty in AI Systems

The Bitter Lesson for AI Safety Research