Modelling, Measuring, and Intervening on Goal-directed Behaviour in AI Systems
TL;DR This is the first post in an upcoming series of blog posts outlining Project Telos. This project is being carried out as part of the Supervised Program for Alignment Research (SPAR). Our aim is to develop a methodological framework to detect and measure goals in AI systems. In this...