Argument, intuition, and recursion

paulfchristiano

Mathematicians answer clean questions that can be settled with formal argument. Scientists answer empirical questions that can be settled with experimentation.

Collective epistemology is hard in domains where it's hard to settle disputes with either formal argument or experimentation (or a combination), like policy or futurism.

I think that's where rationalists could add value, but first we have to grapple with a basic question: if you can't settle the question with logic, and you can't check your intuitions against reality to see how accurate they are, then what are you even doing?

In this post I'll explain how I think about that question. For those who are paying close attention, it's similar to one or two of my previous posts (e.g. 1 2 3 4 5...).

I. An example

An economist might answer a simple question ("what is the expected employment effect of a steel tariff?") by setting up an econ 101 model and calculating equilibria.

After setting up enough simple models, they can develop intuitions and heuristics that roughly predict the outcome without actually doing the calculation.

These intuitions won't be as accurate as intuitions trained against the real world---if our economist could observe the impact of thousands of real economic interventions, they should do that instead (and in the case of economics, you often can). But the intuition isn't vacuous either: it's a fast approximation of econ 101 models.

Once our economist has built up econ 101 intuitions, they can consider more nuanced arguments that leverage those fast intuitive judgments. For example, they could consider possible modifications to their simple model of steel tariffs (like labor market frictions), use their intuition to quickly evaluate each modification, and see which modifications actually affect the simple model's conclusion.

After going through enough nuanced arguments, they can develop intuitions and heuristics that predict these outcomes. For example, they can learn to predict which assumptions are most important to a simple model's conclusions.

Equipped with these stronger intuitions, our economist can use them to get better answers: they can construct more robust models, explore the most important assumptions, design more effective experiments, and so on.

(Eventually our economist will improve their intuitions further by predicting these better answers; they can use the new intuitions to answer more complex questions....)

Any question that can be answered by this procedure could eventually be answered using econ 101 directly. But with every iteration of intuition-building, the complexity of the underlying econ 101 explanation increases geometrically. This process won't reveal any truths beyond those implicit in the econ 101 assumptions, but it can do a good job of efficiently exploring the logical consequences of those assumptions.

(In practice, an economist's intuitions should incorporate both theoretical argument and relevant data, but that doesn't change the basic picture.)

II. The process

The same recursive process is responsible for most of my intuitions about futurism. I don't get to test my intuition by actually peeking at the world in 20 years. But I can consider explicit arguments and use them to refine my intuitions---even if evaluating arguments requires using my current intuitions.

For example, when I think about takeoff speeds I'm faced with questions like "how much should we infer from the difference between chimps and humans?" It's not tractable to answer all of these subquestions in detail, so for a first pass I use my intuition to answer each subquestion.

Eventually it's worthwhile to explore some of those subquestions in more depth, e.g. I might choose to explore the analogy between chimps and humans in more depth. In the process run into sub-sub-questions, like "to what extent is evolution optimizing for the characteristics that changed discontinuously between chimps and humans?" I initially answer those subquestions with intuition but might sometimes expand them in the same way, turning up sub-sub-sub-questions...

When I examine the arguments for a question Q, I use my current intuition to answer the subquestions that I encounter. Once I get an answer for Q, I do two things:

I update my cached belief about Q, to reflect the new things I've learned.
If my new belief differs from my original intuition, I update my intuition. My intuitions generalize across cases, so this will affect my view on lots of other questions.

A naive description of reasoning only talks about the first kind of update. But I think that the second kind is where 99% of the important stuff happens.

(There isn't any bright line between these two cases. A "cached answer" is just a very specific kind of intuition, and in practice the extreme case of seeing the exact question multiple times is mostly irrelevant. For example, it's not helpful to have a cached answer to "how fast will AI takeoff be?"; instead I have a cluster of intuitions that generate answers to a hundred different variants of that question.)

The second kind of update can come in lots of flavors. Some examples:

When I make an intuitive judgment I have to weigh lots of different factors: my own snap judgment, others' views, various heuristic arguments, various analogies, etc. I set these weights partly based on empirical predictions but largely based on predicting the result of arguments. For example, in many contexts I'd lean heavily on Carl or Holden's views, based on them systematically predicting the views that I'd hold after exploring arguments in more detail.
I have many explicit heuristics or high-level principles of reasoning that have been refined to predict the results of more detailed arguments. For example, I often use a cluster of "anti-fanaticism" heuristics, against assigning unbounded ratios between the importance of different considerations. This is not actually a simple general principle to state, and it's not supported by a general argument, instead I have an intuitive sense of when the heuristic applies.
My unconscious judgments are significantly optimized to predict the result of longer arguments. This is most obvious in cases like mathematics---for example, I have a well-developed intuitions about duality and the Fourier transform that lets me answer hard questions, which was refined almost entirely by practice. Intuitions are harder to see (and less reliable) in cases like economics of foom or robustness of RL to function approximators, but something basically similar is going on.

Note that none of these have independent evidential value, they would be screened off by exploring the arguments in enough detail. But in practice it's pretty hard to do that, and in many cases might be computationally infeasible.

Like the economist in the example, I would do better by updating my intuitions against the real world. But in many domains there just isn't that much data---we only get to see one year of the future per year, and policy experiments can be very expensive---and this approach allows us to stretch the data we have by incorporating an increasing range of logical consequences.

III. Disclaimer

The last section is partly a positive description of how I actually reason and partly a normative description of how I believe people should reason. In the next section I'll try to turn it into a collective epistemology.

I've found this framework useful for clarifying my own thinking about thinking. Unfortunately, I can't give you much empirical evidence that it works well.

Even if this approach was the best thing since sliced bread, I think that empirically demonstrating that it helps would still be a massive scientific project. So I hope I can be forgiven for a lack of empirical rigor. But you should still take everything with a grain of salt.

And I want to stress: I don't mean to devalue diving deeply into arguments and fleshing them out as much as possible. I think it's usually impossible to get all the way to a mathematical argument, but you can take a pretty giant step from your initial intuitions. Though I talk about "one step backups" in the above examples for simplicity, I think that updating on really big steps is often a better idea. Moreover, if we want to have the best view we can on a particular question, it's clearly worth unpacking the arguments as much as we can. (In fact the argument in this post should make you unpack arguments more, since in addition to the object-level benefit you also benefit from building stronger transferrable intuitions.)

IV. Disagreement

Suppose Alice and Bob disagree about a complicated question---say AI timelines---and they'd like to learn from each other.

A common (implicit) hope is to exhaustively explore the tree of arguments and counterarguments, following a trail of higher-level disagreements to each low-level disagreement. If Alice and Bob mostly have similar intuitions, but they've considered different arguments or have different empirical evidence, then this process can highlight the difference and they can sometimes reach agreement.

Often this doesn't work because Alice and Bob have wildly different intuitions about a whole bunch of different questions. I think that in a complicated argument, the number of subquestions about which Alice and Bob can be astronomically large, and there is zero hope for resolving any significant fraction of them. What to do then?

Here's one possible strategy. Let's suppose for simplicity that Alice and Bob disagree, and that an outside observer Judy is interested in learning about the truth of the matter (the identical procedure works if Judy is actually one of Alice and Bob). Then:

Alice explains her view on the top level question, in terms of her answers to simpler subquestions. Bob likely disagrees with some of these steps. If there is disagreement, Alice and Bob talk until they "agree to disagree"—they make sure that they are using the subquestion to mean the same thing, and that they've updated on each others' beliefs (and whatever cursory arguments each of them is willing to make about the claim). Then Alice and Bob find their most significant disagreement and recursively apply the same process to that disagreement.

They repeat this process until they reach a state where they don't have any significant disagreements about subclaims (potentially because there are none, and the claim is so simple that Judy feels confident she can assess its truth directly).

Hopefully at this point Alice and Bob can reach agreement, or else identify some implicit subquestion about which they disagree. But if not, that's OK too. Ultimately Judy is the arbiter of truth. Every time Alice and Bob have been disagreeing, they have been making a claim about what Judy will ultimately believe.

The reason we were exploring this claim was because Alice and Bob disagreed significantly before we unpacked the details. Now at least one of Alice and Bob learns that they were wrong, and both of them can update their intuitions (including their intuitions for how much to respect each others' opinions in different kinds of cases).

Alice and Bob then start the process over with their new intuitions. The new process might involve pursuing a nearly-identical set of disagreements (which they can do extremely quickly), but at some point it will take a different turn.

If you run this process enough times, eventually (at least one of) Alice or Bob will change their opinion about the root question---or more precisely, about what Judy will eventually come to believe about the root question---because they've absorbed something about the others' intuition.

There are two qualitatively different ways that agreement can occur:

Convergence. Eventually, Alice will have absorbed Bob's intuitions and vice versa. This might take a while—potentially, as long as it took Alice or Bob to originally develop their intuitions. (But it can still be exponentially smaller than the size of the tree.)
Mutual respect. If Alice and Bob keep disagreeing significantly, then the simple algorithm "take the average of Alice and Bob's view" will outperform at least one of them (and often both of them). So two Bayesians can't disagree significantly too many times, even if they totally distrust one another.

If Alice and Bob are poor Bayesians (or motivated reasoners) and continue to disagree, then Judy can easily take the matter into her own hands by deciding how to weigh Alice and Bob's opinions. For example, Judy might decide that Alice is right most of the time and Bob is being silly by not deferring more---or Judy might decide that both of them are silly and that the midpoint between their views is even better.

The key thing that makes this work---and the reason it requires no common knowledge of rationality or other strong assumptions---is that Alice and Bob can cash out their disagreements as a prediction about what Judy will ultimately believe.

Although it introduces significant additional complications, I think this entire scheme would sometimes work better with betting, as in this proposal. Rather than trusting Alice and Bob to be reasonable Bayesians and eventually stop disagreeing significantly, Judy can instead perform an explicit arbitrage between their views. This only works if Alice and Bob both care about Judy's view and are willing to pay to influence it.

V. Assorted details

After convergence Alice and Bob agree only approximately about each claim (such that they won't update much from resolving the disagreement). Hopefully that lets them agree approximately about the top-level claim. If subtle disagreements about lemmas can blow up to giant disagreements about downstream claims, then this process won't generally converge. If Alice and Bob are careful probabilistic reasoners, then a "slight" disagreement involves each of them acknowledging the plausibility of the others' view, which seems to rule out most kinds of cascading disagreement.

This is not necessarily an effective tool for Alice to bludgeon Judy into adopting her view, it's only helpful if Judy is actually trying to learn something. If you are trying to bludgeon people with arguments, you are probably doing it wrong. (Though gosh there are a lot of examples of this amongst the rationalists.)

By the construction of the procedure, Alice and Bob are having disagreements about what Judy will believe after examining arguments. This procedure is (at best) going to extract the logical consequences of Judy's beliefs and standards of evidence.

Alice and Bob don't have to operationalize claims enough that they can bet on them. But they do want to reach agreement about the meaning of each subquestion, and in particular understand what meaning Judy assigns to each subquestion. "Meaning" captures both what you infer from an answer to that subquestion, and how you answer it). If Alice and Bob don't know how Judy uses language, then they can learn that over the course of this process, but hopefully we have more cost-effective ways to agree on the use of language (or communicate ontologies) than going through an elaborate argument procedure.

One way that Alice and Bob can get stuck is by not trusting each others' empirical evidence. For example, Bob might explain his beliefs by saying that he's seen evidence X, and Alice might not trust him or might believe that he is reporting evidence selectively. This procedure isn't going to resolve that kind of disagreement. Ultimately it just punts the question to what Judy is willing to believe based on all of the available arguments.

Alice and Bob's argument can have loops, if e.g. Alice believe X because of Y, which she believes because of X. We can unwind these loops by tagging answers explicitly with the "depth" of reasoning supporting that answer, decrementing the depth at each step, and defaulting to Judy's intuition when the depth reaches 0. This mirrors the iterative process of intuition-formation which evolves over time, starting from t=0 when we use our initial intuitions. I think that in practice this is usually not needed in arguments, because everyone knows why Alice is trying to argue for X---if Alice is trying to prove X as a step towards proving Y, then invoking Y as a lemma for proving X looks weak.

My futurism examples differ from my economist example in that I'm starting from big questions, and breaking them down to figure out what low-level questions are important, rather than starting from a set of techniques and composing them to see what bigger-picture questions I can answer. In practice I think that both techniques are appropriate and a combination usually makes the most sense. In the context of argument in particular, I think that breaking down is a particularly valuable strategy. But even in arguments it's still often faster go on an intuition-building digression where we consider subquestions that haven't appeared explicitly in the argument.

I like the economist example as a good illustration of the process, but it also makes me slightly reduce my (already-low) confidence in our ability to make our sensible predictions about things like take-off speeds, given that economists say things like this:

As my colleague Jeffrey Friedman argues, expert predictions about the the likely effects of changing a single policy tend to be pretty bad. I’ll use myself as an example. I’ve followed the academic literature about the minimum wage for almost twenty years, and I’m an experienced, professional policy analyst, so I’ve got a weak claim to expertise in the subject. What do I have to show for that? Not much, really. I’ve got strong intuitions about the likely effects of raising minimum wages in various contexts. But all I really know is that the context matters a great deal, that a lot of interrelated factors affect the dynamics of low-wage labor markets, and that I can’t say in advance which margin will adjust when the wage floor is raised. Indeed, whether we should expect increases in the minimum wage to hurt or help low-wage workers is a question Nobel Prize-winning economists disagree about. Labor markets are complicated!

Which in my mind could be summarized as "after 20 years of studying the theory and practical studies of this topic, I've got strong intuitions, but in practice they aren't enough for me to make strong predictions". And it seems that the question of minimum wage is one for which there is much more direct evidence than there is for something like take-off speeds, suggesting that we should be able even less able to make good predictions about that.

The usual implicit approach is to explore the tree of arguments and counterarguments, using disagreement as a heuristic to prioritize which points to explore in more detail... Often this doesn't work because Alice and Bob have wildly different intuitions about a whole bunch of different questions... What to do then?

Apologies for being dense, but I would also describe the procedure that you lay out following this bit as a "tree of arguments and counterarguments, using disagreement as a heuristic to prioritize which points to explore".

What's the crucial difference between the procedure you describe and the usual approach? Is it that whenever you hit on any subquestion simple enough to resolve, everyone updates their intuitions and you start again from the top?

("They repeat this process until they reach a state where they don't have any significant disagreements about subclaims... Alice and Bob then start the process over with their new intuitions.")

Changed to: "A common (implicit) hope is to exhaustively explore the tree of arguments and counterarguments, following a trail of higher-level disagreements to each low-level disagreement."

I'm distinguishing between hoping to change your view about the root question by getting to each disagreement in turn and propagating the logical consequences of resolving it, or viewing each disagreement as an observation that can help refine intuition / arbitrate between conflicting intuitions.

I curated this post for the following reasons:

The question of how to improve our intuitions around topics with few and slow feedback loops is a central question of rationality, and this post added a lot of helpful explicit models to this problem that I've not seen put quite this way anywhere else.
The core ideas seem not only valuable epistemically, but also to underly some promising alignment strategies I've seen (that I believe inherit from you, though I'm uncertain about this point).

The biggest hesitation I had with curating this post:

Each section is very detailed, and it took me a surprising amount of work to understand the structure of and successful chunk both the post overall and its subsections. Especially so given its length.

Overall I am excited to sit with these models for a while and integrate them with my current epistemic practices; thank you for writing this post, I hope you write more like it.

philosophers answer clean questions that can be settled with formal argument

ಠ_ಠ

I think the example of philosophy might actually be a good case for exploring problems with this model of disagreement. Particularly when people start arguing from high-level intuitions that they nonetheless have difficulty breaking down, or when at least one of the people arguing is in fact deeply confused and is going to say some things that turn out to be nonsense, or when breaking down the problem in a particular way implicitly endorses certain background assumptions. Productively making use of disagreements under these conditions seems, to me, to require being willing to try out lots of toy models, attempting to "quarantine" the intuitions under dispute and temporarily ignore their fruits, and trying to find ways to dissolve the disagreement or render it moot.

I switched back to "mathematicians," I think philosophers do this sometimes but they also do other stuff so it's not a good example.

I think philosophy is a fine use case for this approach, it's messy but everything ends up being messy.

This is a nice, simple model for thinking. But I notice that both logic and empiricism sometimes have "shortcuts" — non-obvious ways to shorten, or otherwise substantially robustify, the chain of (logic/evidence). It's reasonable to imagine that intuition/rationality would also have various shortcuts; some that would correspond to logical/empirical shortcuts, and some that would be different. Communication is more difficult when two people are using chains of reasoning that differ substantially in what shortcuts they use. You could get two valid arguments on a question, and be able to recognize the validity of each, but be almost completely at a loss when trying to combine those two into an overall judgement.

Oops, I guess that was more of a comment than a review. At review-level, what I meant to say was: nice foundation, but it's clear this doesn't exhaust the question. Which is good.

As I said at the time

The question of how to improve our intuitions around topics with few and slow feedback loops is a central question of rationality, and this post added a lot of helpful explicit models to this problem that I've not seen put quite this way anywhere else.

I continue to think about some of the ideas in this post regularly.

Any question that can be answered by this procedure could eventually be answered using econ 101 directly.

What do you mean by this? Is the idea that you could create a super complex (and accurate) model, composed of only econ 101 parts?

I guess this is true if econ 101 is universal in some sense, so that all 201, etc. intuitions are implicit in econ 101 intuitions. Is that what you have in mind?

EDIT: I should have just kept reading:

This process won't reveal any truths beyond those implicit in the econ 101 assumptions, but it can do a good job of efficiently exploring the logical consequences of those assumptions.

I've gotten a lot of value out of posts in the reference class of "attempts at somewhat complete models of what good reasoning looks like", and this one has been one of them.

I don't think I fully agree with the model outlined here, but I think the post did succeed at adding it to my toolbox.

Can you recommend some other posts in that reference class?

Alice and Bob's argument can have loops, if e.g. Alice believe X because of Y, which she believes because of X. We can unwind these loops by tagging answers explicitly with the "depth" of reasoning supporting that answer

A situation I've come across is that people often can't remember all the evidence they used to arrive at conclusion X. They remember that they spent hours researching the question, that they did their best to get balanced evidence and are happy that they conclusion they drew at the time was a fair reflection of the evidence they found, but they can't remember the details of the actual research, nor are the confidence that they could re-create the process in such a way as to rediscover the exact same sub-set of evidence their search found at that time.

This makes asking them to provide a complete list of Ys upon which their X depends problematic, and understandably they feel it is unfair to ask them to abandon X, without compensating them for the time to recreate an evidential basis equal in size to their initial research, or demand an equivalent effort from those opposing them.

(Note: I'm talking here about what they feel in that situation, not what is necessarily rational or fair for them to demand.)

If, instead of asking the question "How do we know what we know?", we ask instead "How reliable is knowledge that's derived according to a particular process?" then it might be something that could be objectively tested, despite there being an element of self-referentiality (or boot strapping) in the assumption that this sort of testing process is something that can lead to a net increase of what we reliably know.

However doing so depends upon us being able to define the knowledge derivation processes being examined precisely enough that evidence of how they fare in one situation is applicable to their use in other situations, and upon the concept of there being a fair way to obtain a random sample of all possible situations to which they might be applied, despite other constraints upon the example selection (such as having a body of prior knowledge against which the test result can be compared in order to rate the reliability of the particular knowledge derivation process being tested).

Despite that, if we are looking at two approaches to the question "how much should we infer from the difference between chimps and humans?", we could do worse than specify each approach in a well defined way that is also general enough to apply to some other situations, and then have a third party (that's ignorant of the specific approaches to be tested) come up with several test cases with known outcomes, that the two approaches could both be applied to, to see which of them comes up with the more accurate predictions for a majority of the test cases.

As my colleague Jeffrey Friedman argues, expert predictions about the the likely effects of changing a single policy tend to be pretty bad. I’ll use myself as an example. I’ve followed the academic literature about the minimum wage for almost twenty years, and I’m an experienced, professional policy analyst, so I’ve got a weak claim to expertise in the subject. What do I have to show for that? Not much, really. I’ve got strong intuitions about the likely effects of raising minimum wages in various contexts. But all I really know is that the context matters a great deal, that a lot of interrelated factors affect the dynamics of low-wage labor markets, and that I can’t say in advance which margin will adjust when the wage floor is raised. Indeed, whether we should expect increases in the minimum wage to hurt or help low-wage workers is a question Nobel Prize-winning economists disagree about. Labor markets are complicated!