JasonB

Developmental Cognitive Interpretability: A Research Agenda for Modelling Generalisation and Predicting Agent Behaviour

Summary Safe deployment of an AI system requires that we can make confident claims about its behaviour on out-of-distribution deployment inputs on the basis of only pre-deployment evaluations. One approach to making such claims is to take a cognitive perspective, in which we interpret the AIs behaviour in terms of...

May 2941

JasonB

JasonB

Developmental Cognitive Interpretability: A Research Agenda for Modelling Generalisation and Predicting Agent Behaviour

AI Safety Research Futarchy: Using Prediction Markets to Choose Research Projects for MARS

TAMing The Alignment Problem

Quantifying General Intelligence

JasonB

Developmental Cognitive Interpretability: A Research Agenda for Modelling Generalisation and Predicting Agent Behaviour

AI Safety Research Futarchy: Using Prediction Markets to Choose Research Projects for MARS

TAMing The Alignment Problem

Quantifying General Intelligence

Developmental Cognitive Interpretability: A Research Agenda for Modelling Generalisation and Predicting Agent Behaviour

AI Safety Research Futarchy: Using Prediction Markets to Choose Research Projects for MARS

TAMing The Alignment Problem

Quantifying General Intelligence