This post is part of the work done at Conjecture.
Epistemic Status: Palimpsest
Better epistemology should make you stronger.
Which is why at Conjecture's' epistemology team we are so adamant on improving our models of knowledge-production: this feels like the key to improving alignment research across the board, given the epistemological difficulties of the field.
Yet we have failed until now to make our theory of impact legible, both to ourselves and to well-meaning external reviewers. The most sorely missing piece is the link between better models of knowledge-production and quick improvements of alignment research, in the shorter timelines that we expect at Conjecture .
Following interviews that we conducted with a handful of alignment researchers (John Wentworth, Vanessa Kosoy, Evan Hubinger, Abram Demski, Steve Byrnes, Conjecture researchers, and Refine participants), we want to present our current best guess for framing how our epistemology research can make alignment research stronger: revealing, analyzing, and expanding or replacing what we call "Cached Methodologies" — patterns that encode how research is supposed to proceed in a given context, for example the idea that we need to prove a statement to learn if it's true or not. Given that this involves bringing to light and questioning cached thoughts about methodology, we dub this approach methodological therapy.
Note that we definitely also want to leverage better models of knowledge-production to help field-builders and newcomers; our current focus on explicit applications for established researchers comes from two key factors: we're currently only three people in the epistemology team, which means we have to prioritize; and we expect that models and tools useful to established researchers will prove instrumental in building ones for field-builders and newcomers.
On credit: this idea emerged from discussions within the epistemology team (composed of Adam Shimi, Lucas Teixeira, and Daniel Clothiaux); Adam contributed the overall perspective, the unification and the therapy metaphor (from Bachelard); Lucas contributed the focus on actual cached patterns in researchers' minds and interviews; Daniel contributed the expansion of the idea to encompass more alternatives (for examples more ways of doing experiments) as well as many insights from the literature. This post is written by Adam; "we" means that the opinion is that of the whole team, and "I" means it's only Adam's opinion.
Thanks to Clem and Andrea Motti for feedback on drafts.
Our starting point was conducting interviews with researchers, in order to improve the product-market fit of our epistemological research by better understanding and targeting the needs of researchers (and in the future of field-builders and newcomers).
We ended up asking explicitly about the kind of bottlenecks researchers dealt with, but also more general probes about how they were spending their time. Some interviewees articulated bottlenecks by themselves, while others didn't really see explicit bottlenecks but identified particularly costly activities in their research.
In particular, four research activities were often highlighted as difficult and costly (here in order of decreasing frequency of mention):
I don't know what your first reaction to this list is, but for us, it was something like: "Oh, none of these activities seems strictly speaking necessary in knowledge-production." Indeed, a quick look at history presents us with cases where each of those activities was bypassed:
What these examples highlight is the classical failure when searching for the need of customers: to anchor too much on what people ask for explicitly, instead of what they actually need. The same applies when getting feedback on writing, as Neil Gaiman pointed out so well:
(Writing Advice, Neil Gaiman, 2012)
Remember: when people tell you something’s wrong or doesn’t work for them, they are almost always right. When they tell you exactly what they think is wrong and how to fix it, they are almost always wrong.
So it's incredibly relevant that researchers highlight such activities as particularly costly: those are clearly the places where they spend much of their time or "hard thinking". Yet it might be a mistake to conclude that the best way of helping their research is to accelerate these particular activities. After all, what if there exist cheaper alternatives? Or what if the way the activity is conceptualized in the researcher's mind misses other ways of going about the same activity — like believing that experiments are only for falsification might hide the value of exploratory experiments?
All this requires a deeper analysis of the methodologies (implicit or explicit) used by researchers to go about producing knowledge and solving problems.
Obviously, methodologies (patterns about what research should look like and when to do a certain activity like running an experiment) can prove incredibly productive: someone with a model of how proof or experiments work is able to do more than someone completely lacking this knowledge. These methodologies amplify our agency by giving us more powerful tools to optimize for knowledge-production and problem-solving.
And yet there is a clear failure mode, which almost always happens: the methodology become cached. Where before it was an explicit model, an acknowledged part of the map, it now feels integral to the territory. And thus except if it fails in a spectacular fashion, it ossifies.
This story reappears again and again in the history of science and technology, and in the philosophy of science too. For example, almost every science that took form after the early successes of physics tried to model itself according to it; it inherited the cached methodologies of physicists in contexts where they didn't worked anymore. Or look at the so-called scientific method, which actually hides away both the generation of good hypotheses, and the full role experiments can play (including in hypothesis generation). Both of these are valuable maps (in some context) that became confused with the territory — cached methodologies.
Gaston Bachelard, the criminally underread (in the anglophone world) French philosopher of science from the early 20th century, believed that progress in science depended on the bringing to light and the reshaping of such cached assumptions and methodologies in order to make them fit for the current needs of scientific exploration. He called such episodes epistemological ruptures and he even coined the expression "Philosophy of Non" to describe this approach to science that focuses on the correction of cached patterns in general.
An example of such progress through epistemological rupture is Einstein's relativity, which required unearthing multiple cached patterns (notably about simultaneity) before becoming possible. Similarly, Faraday's and Maxwell's work on electromagnetism required them to reveal and correct the cached pattern of action-at-a-distance that had been integral to mathematical physics since Newton. Both of these also leveraged different methodologies than the way other physicists were using at the time.
Bachelard's favorite metaphor for this process is psychoanalysis, where the cached pattern (Bachelard would say epistemological obstacle) is brought to the fore, and then overcome and reshaped to better fit the needs of the moment.
We follow his trail and frame our approach as methodological therapy: unearthing the cached methodologies that govern how actual researchers go about their work, and work with them and the larger perspective we get from the history and philosophy of science in order to improve these methodologies and adapt them to the context at hand (here the difficult epistemic circumstances of alignment).
Still, before moving forward, let's remember Eliezer's warning about thinking that such improvement of cached methodologies are easy. (Here he's talking about Einstein figuring out General Relativity without much focus on raw experimental data.)
(Einstein's Speed, Eliezer Yudkowsky, 2008)
I will not say, "Don't try this at home." I will say, "Don't think this is easy." We are not discussing, here, the victory of casual opinions over professional scientists. We are discussing the sometime historical victories of one kind of professional effort over another. Never forget all the famous historical cases where attempted armchair reasoning lost.
The metaphor of therapy highlights a key insight: you need to stay in conversation with actual people. Without that, the therapy becomes an academic exercise of sharing models that might be vaguely related to the problem at hand. This direct interaction is guided by two framing questions:
On the other hand, a therapist that merely listen and understands, but cannot take a wider view, will only be minimally helpful. We thus still need to build better models of knowledge-production, but geared towards the actual needs of researchers in the field. This vindicates some of our intuitions about the irrelevance of some traditional questions in philosophy of science, like the realism/anti-realism debate. And it prompts the following framing questions:
Last but not least, therapy can easily fuck people up if it's done badly. We thus want to be careful about replacing a methodology with another without taking into account the value and usefulness of the initial one; otherwise we might slow down research rather than accelerate it.
We're excited about this framing because it structures and grounds the interaction between our abstract research into models of knowledge-production and the concrete needs of researchers. These concrete epistemic aims and cached methodologies turn our vague wonderings about the different ways of revealing and producing evidence, into applications at the heart of alignment research.
Even if you buy into this line of research, we are still missing one key ingredient for our short-timelines theory of change: how much therapy is needed before being concretely useful to alignment researchers? Or phrased differently, is the value of this research robust to scaling down? After all, if to accelerate alignment research we need to wait until we fully solve all the questions above, we're probably not going to help a lot in a 5 to 15 years time frame.
We are not too worried about this, because answering even one or two of our framing questions will probably prove useful to researchers. If you're agentic enough, better understanding your aims and the cached methodologies you use already expands your range and your ability to adapt to different situations. It would obviously be more useful to have better methodologies to use, detailed models of when to use each alternative, or even a full algorithm of how to extract evidence in the relevant ways — but these massive accomplishments are not necessary for the epistemology to make you stronger.
We also expect that partial progress on better and better methodologies to give positive returns quickly, and to keep doing so until the eventual optimal ones; this is based on observing that often big progress in science emerges from relatively small changes in methodologies.
This post mainly takes the angle of "How epistemology can make alignment research stronger", but the opposite is also definitely part of the picture here: methodological therapy with actual researchers is bound to improve our epistemology. Already the interviews we conducted and the reflections that went into this post gave much needed structure to our many threads of exploration.
Lastly, you might wonder why alignment was not much mentioned in the theory of change above. Sure, we anchored our framing in the needs of alignment researchers, but except that, all of our arguments seem to transfer to the rest of science (and Bachelard's model was about science in general).
We certainly believe that this work should also make us stronger at science in general. Yet there are two non-trivial ways in which it is heavily biased towards alignment:
We believe that progress in epistemology should make you stronger — better models of knowledge-production should make you better and faster researchers. Following interviews with research and Gaston Bachelard's work, we frame this improvement through methodological therapy, the analysis and revealing of epistemic aims and cached methodologies than can then be confronted with historical examples, expanded, and reshaped to better serve the needs of the researchers.
We then articulated framing questions guiding the process of epistemological therapy.
In light of this perspective, we are currently focusing on working with the interpretability team at Conjecture and the Refine participants to iterate on this process of methodological therapy, while keeping in mind the framing questions related to risks and downsides. After this trial run, we expect to expand our efforts towards more researchers and also field-builders and newcomers.
If that sounds familiar to Kuhn's paradigm shifts and scientific revolutions, it definitely captures some of the same points. Now you might realize why french philosophers of science were not as excited or interested by Kuhn's book when it came out: the notion had been in their university courses for over 50 years at that point.
It has been translated "Philosophy of No" in english, but that is missing in my opinion the double meaning that Bachelard aimed for, which is that in french, the word for no and the word use to indicate negation in non-euclidian are the same one, "non".
Glad to see you're working on this. It seems even more clearly correct (the goal, at least :)) for not-so-short timelines. Less clear how best to go about it, but I suppose that's rather the point!
A few thoughts:
Thanks for the kind words and useful devil's advocate! (I'm expecting nothing less from you ;p)
I expect it's unusual that [replace methodology-1 with methodology-2] will be a pareto improvement: other aspects of a researcher's work will tend to have adapted to fit methodology-1. So I don't think the creation of some initial friction is a bad sign. (also mirrors therapy - there's usually a [take things apart and better understand them] phase before any [put things back together in a more adaptive pattern] phase)It might be useful to predict this kind of thing ahead of time, to develop a sense of when to expect specific side-effects (and/or predictably unpredictable side effects)
I agree that pure replacement of methodology is a massive step that is probably premature before we have a really deep understanding both of the researcher's approach and of the underlying algorithm for knowledge production. Which is why in my model, this comes quite late; instead the first step are more revealing the cached methodology to the researcher, and showing alternatives from History of Science (and Technology) to make more options and approaches credible for them.Also looking at the "sins of the fathers" for philosophy of science (how methodologies have fucked up people across history) is part of our last set of framing questions. ;)
I do think it's worth interviewing at least a few carefully selected non-alignment researchers. I basically agree with your alignment-is-harder case. However, it also seems most important to be aware of things the field is just completely missing.In particular, this may be useful where some combination of cached methodologies is a local maximum for some context. Knowing something about other hills seems useful here.I don't expect it'd work to import full sets of methodologies from other fields, but I do expect there are useful bits-of-information to be had.Similarly, if thinking about some methodology x that most alignment researchers currently use, it might be useful to find and interview other researchers that don't use x. Are they achieving [things-x-produces] in other ways? What other aspects of their methodology are missing/different?This might hint both at how a methodology change may impact alignment researchers, and how any negative impact might be mitigated.
Two reactions here:
Worth considering that there's less of a risk in experimenting (kindly, that is) on relative newcomers than on experienced researchers. It's a good idea to get a clear understanding of the existing process of experienced researchers. However, once we're in [try this and see what happens] mode there's much less downside with new people - even abject failure is likely to be informative, and the downside in counterfactual object-level research lost is much smaller in expectation.
I see what you're pointing out. A couple related thoughts:
In particular, four research activities were often highlighted as difficult and costly (here in order of decreasing frequency of mention):Running experimentsFormalizing intuitionsUnifying disparate insights into a coherent frameProving theoremsI don't know what your first reaction to this list is, but for us, it was something like: "Oh, none of these activities seems strictly speaking necessary in knowledge-production." Indeed, a quick look at history presents us with cases where each of those activities was bypassed:Einstein figured out special and general relativity without new experiments by leveraging higher order encoding of previous data (known laws of physics).Faraday figured out the key principles of electromagnetism without formalization by careful experiments and geometric visualizations of lines of force (Post on Faraday's insights and Maxwell's take on them soon to come).The International Temperature Scale grounded thermometers without unification through careful interpolations between 14 different scales and ranges through multiple 15 degree polynomials (Post on the history of thermometry soon to come).Complexity theorists gathered evidence for P≠NP without a proof by connecting it to many different problems, notably the breakdown of approximation algorithms (Post on the ways complexity theorist generate evidence soon to come).What these examples highlight is the classical failure when searching for the need of customers: to anchor too much on what people ask for explicitly, instead of what they actually need.
What these examples highlight is the classical failure when searching for the need of customers: to anchor too much on what people ask for explicitly, instead of what they actually need.
I disagree that this conclusion follows from the examples. Every example you list uses at least one of the methods in your list. So, this might as well be used as evidence for why this list of methods are important.
In addition, several of the listed examples benefited from division of labour. This is a common practice in Physics. Not everyone does experiments. Some people instead specialise in the other steps of science, such as
Formalizing intuitionsUnifying disparate insights into a coherent frameProving theorems
This is very different from concluding that experiments are not necessary.
Thanks for your comment!
Actually, I don't think we really disagree. I might have just not made my position very clear in the original post.
The point of the post is not to say that these activities are not often valuable, but instead to point out that they can easily turn into "To do science, I need to always do [activity]". And what I'm getting from the examples is that in some cases, you actually don't need to do [activity]. There's a shortcut, or maybe just you're in a different phase of the problem.
Do you think there is still a disagreement after this clarification?
I think we agreement.I think the confusion is because it is not clear form that section of the post if you are saying 1)"you don't need to do all of these things" or2) "you don't need to do any of these things".
Because I think 1 goes without saying, I assumed you were saying 2. Also 2 probably is true in rare cases, but this is not backed up by your examples.
But if 1 don't go without saying, then this means that a lot of "doing science" is cargo-culting? Which is sort of what you are saying when you talk about cached methodologies.
So why would smart, curious, truth-seeking individuals use cached methodologies? Do I do this?
Some self-reflection: I did some of this as a PhD student, because I was new, and it was a way to hit the ground running. So, I did some science using the method my supervisor told me to use, while simultaneously working to understand the reason behind this method. I did spend less time that I would have wanted to understand all the assumptions of the sub-sub field of physics I was working in, because of the pressure to keep publishing and because I got carried away by various fun math I could do if i just accepted these assumptions. After my PhD I felt that if I was going to stay in Physics, I wanted to take year or two for just learning, to actually understand Loop Quantum Gravit, and all the other competing theories, but that's not how academia works unfortunately, which is one of the reasons I left.I think that the fundament of good Epistemic is to not have competing incentives.
This reminded me of Brian Cantwell Smith work: