Cozmin Ududec

Message

Assuring Agent Safety Evaluations By Analysing Transcripts

Summary This is a research update from the Science of Evaluation team at the UK AI Security Institute. In this update, we share preliminary results from analysing transcripts of agent activity that may be of interest to researchers working in the field. AISI generates thousands of transcripts when running its...

Oct 10, 2025•7

Message

5 karma

Member for 3 years

Cozmin Ududec — LessWrong

Cozmin Ududec

Message

Cozmin Ududec

Assuring Agent Safety Evaluations By Analysing Transcripts

Oct 10, 2025•7

Message

5 karma

Member for 3 years

Assuring Agent Safety Evaluations By Analysing Transcripts

Jerome Wynne

Jerome Wynne, Cozmin Ududec+ 0 more

Jerome Wynne, Cozmin Ududec

4mo

Summary

This is a research update from the Science of Evaluation team at the UK AI Security Institute. In this update, we share preliminary results from analysing transcripts of agent activity that may be of interest to researchers working in the field.

AISI generates thousands of transcripts when running its automated safety evaluations, e.g. for OpenAI’s o1 model, many of which contain the equivalent of dozens of pages of text. This post details a case study where we systematically analysed the content of 6,390 testing transcripts. We highlight quality issues, such as refusals and tool use faults, and distinctive model failure modes.

We’re sharing this case study to encourage others – particularly those conducting... (read 4313 more words →)

LESSWRONG
LW

LESSWRONG
LW

Cozmin Ududec

Cozmin Ududec

Cozmin Ududec

Assuring Agent Safety Evaluations By Analysing Transcripts

Cozmin Ududec

Cozmin Ududec

Cozmin Ududec

Assuring Agent Safety Evaluations By Analysing Transcripts

Summary