This is a linkpost for the preprint “Regression by Composition”, by Daniel Farewell, Rhian Daniel, Mats Stensrud, and myself.
The paper introduces Regression by Composition (RBC): a new, modular framework for regression modelling built around the composition of group actions. The manuscript has been accepted as a discussion paper in JRSS-B and will be read to the Royal Statistical Society in London on March 24th, 2026.
Background and motivation
In earlier posts on LessWrong, I have argued that an effect parameter I call the Switch Relative Risk (SRR) is often the most appropriate scale for extrapolating causal effects from one population to another—for example, from a randomized trial population to patients seen in routine clinical practice.
That position has been debated extensively elsewhere, including on statistical discourse forums. One common objection is that the odds ratio has a privileged status because it corresponds to the canonical link function in generalized linear models (GLMs), whereas the SRR does not admit a natural GLM formulation.
This objection was one of the original motivations for developing a new regression framework. Regression by Composition allows models that are closely related to the SRR, without forcing them into the GLM mould.
But the SRR—and binary outcomes more generally—are only a small part of why we think RBC is important.
What Regression by Composition does
At a high level, RBC reframes regression models in terms of group actions and invariance, rather than link functions and linear predictors. This shift has several consequences:
Unification: RBC subsumes almost all standard regression models as special cases.
Expansion: It substantially enlarges the class of allowable models beyond what GLMs and other preexisting frameworks permit.
Modularity: Features that traditionally belong to different model classes can be combined within a single coherent model.
Conceptual clarity: Statistical properties of effect parameters—such as collapsibility—can be understood in terms of invariance under group actions, rather than as ad-hoc algebraic quirks.
Interpretability: In many cases, effect parameters correspond more directly to meaningful transformations of the outcome.
Why this matters now
Regression by Composition can be read as a defense—and a modernization—of traditional regression modelling in the age of machine learning.
Rather than treating regression as a narrow, legacy tool defined by a fixed menu of link functions, RBC treats it as a flexible, principled language for expressing assumptions about how variables transform under intervention and conditioning. In that sense, it aims to recover what made regression powerful in the first place, while making explicit structure that has long remained implicit.
If you care about causal interpretation, extrapolation across populations, or the foundations of statistical modelling, this framework is likely to be relevant well beyond the specific debates that motivated it.
This is a linkpost for the preprint “Regression by Composition”, by Daniel Farewell, Rhian Daniel, Mats Stensrud, and myself.
The paper introduces Regression by Composition (RBC): a new, modular framework for regression modelling built around the composition of group actions. The manuscript has been accepted as a discussion paper in JRSS-B and will be read to the Royal Statistical Society in London on March 24th, 2026.
Background and motivation
In earlier posts on LessWrong, I have argued that an effect parameter I call the Switch Relative Risk (SRR) is often the most appropriate scale for extrapolating causal effects from one population to another—for example, from a randomized trial population to patients seen in routine clinical practice.
That position has been debated extensively elsewhere, including on statistical discourse forums. One common objection is that the odds ratio has a privileged status because it corresponds to the canonical link function in generalized linear models (GLMs), whereas the SRR does not admit a natural GLM formulation.
This objection was one of the original motivations for developing a new regression framework. Regression by Composition allows models that are closely related to the SRR, without forcing them into the GLM mould.
But the SRR—and binary outcomes more generally—are only a small part of why we think RBC is important.
What Regression by Composition does
At a high level, RBC reframes regression models in terms of group actions and invariance, rather than link functions and linear predictors. This shift has several consequences:
Why this matters now
Regression by Composition can be read as a defense—and a modernization—of traditional regression modelling in the age of machine learning.
Rather than treating regression as a narrow, legacy tool defined by a fixed menu of link functions, RBC treats it as a flexible, principled language for expressing assumptions about how variables transform under intervention and conditioning. In that sense, it aims to recover what made regression powerful in the first place, while making explicit structure that has long remained implicit.
If you care about causal interpretation, extrapolation across populations, or the foundations of statistical modelling, this framework is likely to be relevant well beyond the specific debates that motivated it.