Summary: This post argues that Artificial General Intelligence (AGI) threatens both liberal democracy and rule-based international order through a parallel mechanism. Domestically, if AGI makes human labor economically unnecessary, it removes the structural incentive for inclusive democratic institutions—workers lose leverage when their contribution is no longer essential. Internationally, if AGI...
Summary: I am helping set up a new skilling-up academic program centred on AI evaluations and their intersection with AI safety. Our goal is to train the people who will who will determine whether AI is safe and beneficial. This should include the various types of methodologies and tools available,...
Summary: This post presents new methodological innovations presented in the paper General Scales Unlock AI Evaluation with Explanatory and Predictive Power. The paper introduces a set of general (universal) cognitive abilities that allow us to predict and explain AI system behaviour out of distribution. I outline what I believe are...
The EU Commission has opened a call for expressions of interest for researchers who would like to advise the EU AI office on the implementation of the AI act, concerning among other topics those related to the safety of general purpose AI systems. There’s a requirement to either hold a...
Summary: This is a submission for the goal misgeneralization contest organized by AI Alignment Awards, as well as the third iteration of some slowly improving AI Safety research agenda that I aim to pursue at some point in the future. Thanks to Jaime Sevilla for the comments that helped sharpen...
Summary This is a distillation post intended to summarize the article How RL Agents Behave When Their Actions Are Modified? by Eric Langlois and Tom Everitt, published at AAAI-21. The article describes Modified Action MDPs, where the environment or another agent such as a human may override the action of...