The AI safety debate agenda, proposed by Irving et al. (2018), explores using debates between AI agents to ensure truthful answers from advanced systems. Recently three key debate settings have been studied with LLMs:
Information asymmetric debates: Debaters have access to information unavailable to the judge. This hidden information is usually a passage of text from which the debaters quote.
Capability asymmetric debates: Debaters are more generally capable / have stronger reasoning abilities than the judge.
Capability symmetric debates: Debaters are as capable as the judge.
Recent work from Khan et al. (2024) and Kenton et al. (2024) found positive outcomes for information asymmetric debates but negative results for capability asymmetric and symmetric debates. Crucially, both papers rely on inference-only techniques and don't attempt any kind of model training.
Our work revisits...
We want to advance process-based supervision for language models. To make it easier for others to contribute to that goal, we're sharing code for writing compositional language model programs, and a tutorial that explains how to get started:
We've been using ICE as part of our work on Elicit and have found it useful in practice.
ICE is an open-source Python library for writing, debugging, and visualizing compositional language model programs. ICE makes it easy to:
R0 tells you how many others each person infects on average. So R0 is in one sense the measure of contagiousness--it just tells you how contagious people with the disease are on average.
Consider two different diseases with the same R0, let's say R0 = 2. So each person on average infects 2 others. For the first disease, almost all patients infect exactly two others, but for the second, plenty infect two, many infect one, and a much smaller number infect 10 or even more others. So the average is the same, but the distribution is very different.
Given some oth
Someone on Reddit linked to this preprint paper arguing that the other moments of the secondary infection curve (variance, skewness, kurtosis) can overwhelm the mean (i.e., the R0) in predicting the number of people ultimately infected. With a high variance, right-skewed, high kurtosis curve (loosely, with relatively few "super-infectors" bringing up the average), there are more chances for the outbreak to stochastically die out before those super-infectors get their chance to keep things going. The authors conclude that "higher moments of the distribution
Specifically, this is known as a hubness effect (when the distribution of the number of times an item is one of the k nearest neighbors of other items becomes increasingly right skewed as the number of dimensions increases) and (with certain assumptions) should be related to the phenomenon of these being closer to the centroid.