Message

LorenzoPacchiardi

Message

Harmfulness Directions in Olmo

by Daniele Pace, Bryan Maruyama, and LorenzoPacchiardi

Introduction This work was conducted as part of the MARS 4.0 program, supervised by Lorenzo Pacchiardi, with Hannes Whittingham and Mikhail Mironov as research managers. The core empirical work was carried out by Bryan Maruyama and Daniele Pace (same contribution). We study how harmfulness is represented in a language model,...

May 18•2

Two Mathematical Perspectives on AI Hallucinations and Uncertainty

A recent OpenAI preprint (and blogpost) examines the sources of AI hallucinations. This reminded me of a 2024 preprint which was similar in scope. These two papers use different mathematical lenses but arrive at complementary conclusions about the necessity of uncertainty expressions in AI systems. I briefly review and compare...

Sep 23, 2025•0

No Answer Needed: Predicting LLM Answer Accuracy from Question-Only Linear Probes

by antonghawthorne, ivanvmoreno, Arnau Padrés Masdemont, David Africa, and LorenzoPacchiardi

TLDR: This is the abstract, introduction and conclusion to the paper. See here for a summary thread. Abstract Do large language models (LLMs) anticipate when they will answer correctly? To study this, we extract activations after a question is read but before any tokens are generated, and train linear probes...

Sep 16, 2025•10