David Lorell

The Plan - 2025 Update

What’s “The Plan”? For several years now, around the end of the year, I (John) write a post on our plan for AI alignment. That plan hasn’t changed too much over the past few years, so both this year’s post and last year’s are written as updates to The Plan...

Dec 31, 202596

Conditional On Long-Range Signal, Ising Still Factors Locally

by johnswentworth and David Lorell

Background: The Ising Model The Ising Model is a classic toy model of magnets. We imagine a big 2D or 3D grid, representing a crystal lattice. At each grid vertex i, there’s a little magnetic atom with state σi, which can point either up (σi=+1) or down (σi=−1). When two...

Dec 12, 202537

An Analogue Of Set Relationships For Distributions

by johnswentworth and David Lorell

Here’s a conceptual problem David and I have been lightly tossing around the past couple days. “A is a subset of B” we might visualize like this: If we want a fuzzy/probabilistic version of the same diagram, we might draw something like this: And we can easily come up with...

Nov 18, 202553

Toward Statistical Mechanics Of Interfaces Under Selection Pressure

by johnswentworth and David Lorell

Imagine using an ML-like training process to design two simple electronic components, in series. The parameters θ1 control the function performed by the first component, and the parameters θ2 control the function performed by the second component. The whole thing is trained so that the end-to-end behavior is that of...

Nov 6, 202541

The Zen Of Maxent As A Generalization Of Bayes Updates

by johnswentworth and David Lorell

Jaynes’ Widget Problem[1]: How Do We Update On An Expected Value? Mr A manages a widget factory. The factory produces widgets of three colors - red, yellow, green - and part of Mr A’s job is to decide how many widgets to paint each color. He wants to match today’s...

Nov 4, 202563

Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence

Around two months ago, John and I published Resampling Conserves Redundancy (Approximately). Fortunately, about two weeks ago, Jeremy Gillen and Alfred Harwood showed us that we were wrong. This proof achieves, using the Jensen-Shannon divergence ("JS"), what the previous one failed to show using KL divergence ("DKL"). In fact, while...

Oct 31, 202542

Natural Latents: Latent Variables Stable Across Ontologies

by johnswentworth and David Lorell

Background on where this post/paper came from About a year ago, we wrote up a paper on natural latents for the ILLIAD proceedings. It was mediocre. The main shortcoming stemmed from using stochastic rather than deterministic natural latents, which give much less conceptually satisfying ontological stability guarantees; there was this...

Sep 4, 2025124

David Lorell

David Lorell

Lessons On How To Get Things Right On The First Try

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Why Not Subagents?

David Lorell

Lessons On How To Get Things Right On The First Try

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Why Not Subagents?

The Plan - 2025 Update

Conditional On Long-Range Signal, Ising Still Factors Locally

An Analogue Of Set Relationships For Distributions

Toward Statistical Mechanics Of Interfaces Under Selection Pressure

The Zen Of Maxent As A Generalization Of Bayes Updates

Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence

Natural Latents: Latent Variables Stable Across Ontologies