Summary
This is a research update from the Science of Evaluation team at the UK AI Security Institute. In this update, we share preliminary results from analysing transcripts of agent activity that may be of interest to researchers working in the field.
AISI generates thousands of transcripts when running its automated safety evaluations, e.g. for OpenAI’s o1 model, many of which contain the equivalent of dozens of pages of text. This post details a case study where we systematically analysed the content of 6,390 testing transcripts. We highlight quality issues, such as refusals and tool use faults, and distinctive model failure modes.
We’re sharing this case study to encourage others – particularly those conducting... (read 4313 more words →)