Tl;dr: This is the second post of two. The first post, An Ontology of Representations, argued that the convergence observed in neural network representations reflects shared training distributions and inductive biases rather than the discovery of objective, mind-independent structure. This post surveys the rapidly growing literature on using machine learning...
BLUF: The Platonic Representation Hypothesis, the Natural Abstraction Hypothesis, and the Universality Hypothesis all claim that sufficiently capable AI systems will converge on a shared, objective model of reality, and that this convergence makes alignment more tractable. I argue that this conclusion does not follow, for four reasons: 1. Reduction...
This is an update to Agent Economics: a BOTEC on feasibility. Toby Ord pointed me to Gus Hamilton's Weibull reanalysis of the METR data. Hamilton finds that a declining hazard rate (Weibull with κ ≈ 0.6–0.9 for SOTA models) may fit the data as well as Ord's constant hazard rate,...
Edit (7th of February): I made an updated version of this, after Toby Ord's comment. You can find it here. Summary: I built a simple back-of-the-envelope model of AI agent economics that combines Ord's half-life analysis of agent reliability with real inference costs. The core idea is that if agent...
[Crosspost from EA forum] Apply to ARBOx2: a programme to rapidly upskill in ML safety. Join us (OAISI) in Oxford this July for our second iteration of ARBOx (Alignment Research Bootcamp Oxford), a 2-week intensive designed to rapidly build skills in ML safety. We’ll run a compressed version of the...