Fernando Rosas

Message

Interested in AI alignment and safety, abstractions, generalisation, and multi-agent systems.

https://profiles.sussex.ac.uk/p555273-fernando-rosas, https://profiles.imperial.ac.uk/f.rosas/about

AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing

Drawing inspiration from the ‘brain in a vat’ thought experiment, this blogpost investigates methods to simplify world models that remain agnostic to the agent under evaluation. This work was done together with Alec Boyd and Manuel Baltieri, with support from the UK ARIA Safeguarded AI programme and the PIBBSS Affiliateship...

Aug 1, 202534

LESSWRONG
LW

LESSWRONG
LW

Fernando Rosas

Fernando Rosas

AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing

Fernando Rosas

Fernando Rosas

AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing

Introduction