I. Levels of intervention

Imagine that you have a classroom of children that you want to behave well. Maybe there are 30 children in the classroom. You could think of the children as individual nodes, each of which operates some way internally and then interacts with the others.

Different levels of intervention are possible.

Level 1 (thought regulation) — You could intervene at the level of the child’s individual thoughts. Police each thought, make sure it is a well-behaved-child thought. “I will sit here and listen and learn” is acceptable, “I’m going to punch Judy” is not. The idea is that if each child has only well-behaved-child thoughts, then the child will be well-behaved, and then the child will interact with the other children in a well-behaved way, and the children in total will behave well.

Level 2 (train of thought regulation) — You could allow some bad behavior at the level of individual thoughts, but instead regulate things at the level of trains of thought. Don’t police each thought, allow some unruly-child thoughts, but make sure that unruly-child thoughts occur only in the context of trains of thought that end with well-behaved conclusions. “I will sit here and listen and learn” is fine. “I’m going to punch Judy” is fine if it is followed by reflection which ends with “Okay, I’m not going to punch Judy”. If we individuate “trains of thought” in a way that makes it so people only act after trains of thoughts, then the idea would be that if each child has only permissible trains of thought, then the child will be well-behaved, and then the child will interact with the other children properly, and the children will in total behave well.

Level 3 (rules for thought/speech/action) — You could allow children to have disorderly and problematic trains of thought, but require those trains of thought to yield well-behaved conclusions within a given timeframe, and within given limits on speech and action. “I will sit here and listen and learn” is acceptable to think, say, and do. “I am going to fly to the moon by flapping my arms” is acceptable to think, say, and try for little while, but then the child should update and stop thinking, saying, and trying. “I’m going to punch Judy” might be acceptable to think and acceptable to say to an adult as part of a process of working it out, but not okay to say to Judy (bullying) and certainly not acceptable to do. The idea here is that if the children think, speak, and behave within the limits of individually acceptable behavior, then each child will be well-behaved enough, and then the children interacting with each other will be orderly enough, and the children in total will be able to considered acting well enough for the relevant purposes. And then maybe that will lead the children in total to behave even better over the course of time.

Level 4 (individual holistic regulation) — You could choose to not police things in a way that is pegged to specific individual limitations on thought, speech, or behavior, but instead have judges who make decisions about individual children on a case by case basis, taking into account facts about the individual children and managing their trajectories towards being well-behaved. Is it fine for Tommy to think “I’m going to punch Judy”, or say it, or actually punch Judy? Depends on the case, and what will help Tommy to eventually become well-behaved. The idea here is that through the wise management of each child individually, each child will become well-behaved, and that this can be done with acceptable interference with other children along the way.

Level 5 (group holistic regulation) — Rather than taking the individual trajectories of each child as the things to optimize, you could have judges make decisions about what will best affect the trajectory of the group in every case. Is it fine for Tommy to think, say, or act on the idea of punching Judy? It depends on what is best for the group and its progress towards being well-behaved, which may or may not break down into what is best for Tommy or Judy in the near term.

There may be other levels or different ways to break down the levels, and it may be that a closer examination will reveal that “levels” isn’t exactly the right way to think about it.

There is nevertheless an important question raised by the above, which is: What should we be trying to optimize in order to optimize the good behavior of the children? Should we focus at the level of thoughts or trains of thought, should we place various objective individual limits, or should we focus on individuals or the group in a more holistic way?

It is obvious that one can make similar levels and ask a similar question about rationality and the pursuit of the truth. What should we be trying to optimize in order to optimize the intellectual performance of a community?

II. Consequences of choosing wrong

In the case of the children, it is quite clear that there are consequences for intervening on the wrong level. Imagine that in our classroom of 30 children, you decide to try to get the children to police their thoughts so that they only have good, well-behaved-child thoughts. We can easily imagine worlds with notably different outcomes:

• World A —You try to get the children to police their thoughts. This works as intended! Only good thoughts remain and the children act well.

• World B — You try to get the children to police their thoughts, but rather than this eliminating the problematic thoughts, the thoughts are converted into subversive impulses. You continue to police, the children try to cooperate and continue to suppress, but the tension grows, and then suddenly there is revolution, with the children overturning desks and throwing books from windows and in general refusing to do anything you want.

• World C — You try to get the children to police their thoughts. The children cooperate and successfully police their thoughts, but at the cost of creativity and love and the spirit of adventure. The children become depressed and quiet and compliant, which might or might not count as “well-behaved”, depending on how you dystopian your original intentions were.

On the other side, we can also imagine worlds where only regulating at the group level in a holistic way goes extremely well or extremely poorly. The most obvious problem scenario is inadequate or illegible regulation, which causes the children to be unable to figure out how to guide themselves, resulting in continued bad behavior. In the worst case scenario, the behavior is bad enough that the system spirals downward, with inadequately policed behavior resulting in conflict and even worse behavior.

In the case of rationality and the pursuit of the truth, there are also consequences for intervening at the wrong level. Here are a few ways that intervening too specifically might go wrong:

• Seeking consensus too quickly. Good intellectual practice should involve taking into account what other people believe. But trying to synchronize beliefs with other people too often may make it much harder to explore new areas and discover new knowledge.

• Dismissing ideas too quickly. Many ideas are wrong and counterproductive. But trying to banish wrong and counterproductive ideas too quickly might lead people to underestimate the virtues of their own thoughts. This could lead people to dismiss good ideas too quickly, diminish their propensity to build models, and more generally underestimate their own capacity for thought.

• Missing macro distortions. Many errors occur at the level of specific, identifiable thoughts. As such, good intellectual practice should include being able to identify errors that occur at that level. An over-focus on errors at the five second level, though, could lead to an under-focus on identifying systematic errors that are in practice too difficult to identify at the five second level.

• Destroying motivation. Good intellectual practice should involve looking at difficult, potentially motivation-destroying truths. But doing this indelicately might actually destroy motivation, thereby preventing people from continuing the quest for the truth.

III. An empirical matter

Of course, one will say, the answer is to intervene at the right levels and not the wrong ones, to combine the levels in the right way, and so forth.

I think the key point is that the question of what works in this domain is an empirical matter.

It might be that seeking Aumann agreement with frequency F yields better thinking by having people take each other’s beliefs into account. Or it might be that seeking Aumann agreement with frequency F yields worse thinking by causing groupthink and preventing the exploration of new domains.

It might be that the correct level of focus is what Eliezer calls the 5 second level. This might yield correct action at the 5 second level and then, by aggregating, the 5 year level. Or it might be that the correct level of focus is more like “individual holistic regulation” or “group holistic regulation”.

It might be that the right allocation of one’s error identification resources is 90% to identifying biases and fixing System 2 and 10% to overcoming deep psychological distortions in System 1. Or it might be 10% and 90%.

It might be that assigning probabilities helps one execute a version of Bayesian updating and thereby helps one to take evidence into account. Or it might be that assigning probabilities draws attention to failure too frequently, destroying motivation.

It’s an empirical matter which of these things work, and it may vary from person to person.

IV. Different approaches to rationality

I expect there to be strange and hard-to-understand relations between mental practices and mental outputs. This is highly plausible at least from the outside view.

As such, I think the correct way to approach rationality and truth-seeking is to study processes that actually work for discovering the sorts of truths you want to discover and people who have actually succeeded at discovering those sorts of truths. On the basis of empirical performance, different practices can be preferred.

Assessing empirical performance is itself frequently very difficult. As a result, this should be a major focus. But one shouldn’t optimize for good empirical performance on intermediate indicators unless there is good reason to believe that those intermediate indicators actually correlate with good empirical performance on the acquisition of truths you want to discover.

I suspect this is one of the larger historical disagreements that I have had with various members of the rationality community. Right now when it comes to intellectual practice, I am most in favor of Levels 1-3 for beginners who are building functional scaffolds, and Levels 4-5 for intermediate level practitioners and beyond. The rationalist community and corpus seems to me to prefer Levels 1-3 much more for all practitioners.

This cashes out into a large number of concrete differences in practice. It is hard to state them, as different rationalists have different practices and I don’t want to overgeneralize. But one source that I imagine is fair to look at is Eliezer’s Twelve Virtues of Rationality. In general, I believe there is something good and important in the spirit of each of the Twelve Virtues. However, it is an empirical question whether adhering to them and trying to instantiate them in oneself will yield an improvement to one’s overall thinking.

Consider the seventh virtue:

The seventh virtue is simplicity. Antoine de Saint-Exupéry said: “Perfection is achieved not when there is nothing left to add, but when there is nothing left to take away.”[3] Simplicity is virtuous in belief, design, planning, and justification. When you profess a huge belief with many details, each additional detail is another chance for the belief to be wrong. Each specification adds to your burden; if you can lighten your burden you must do so. There is no straw that lacks the power to break your back. Of artifacts it is said: The most reliable gear is the one that is designed out of the machine. Of plans: A tangled web breaks. A chain of a thousand links will arrive at a correct conclusion if every step is correct, but if one step is wrong it may carry you anywhere. In mathematics a mountain of good deeds cannot atone for a single sin. Therefore, be careful on every step.

Part of how I approach thinking is to build many models. This is contrary to the seventh virtue. In many conversations I had with rationalists circa 2011-2014, I was told that I was running afoul of the conjunction fallacy. Of course, we all agree that it is a fact of probability that P(A and B) <= P(A). It is then an empirical question whether seeking simplicity in one’s beliefs is better or worse. I have found, for instance, that people who try to keep things simple when going into a new domain have great difficulty getting the gears of thinking turning. Instead, I have found that it is more effective to develop many, many beliefs in the beginning stages of thought, even though this will lower the probability that the conjunction of your beliefs is true during this time.

That is one concrete difference in practice. I can find others, looking through the twelve virtues of rationality. I do think there is something good in the spirit of all of them. My favorite is the twelfth[1]. But while I found it deeply inspirational upon first hearing it, I only came to hold it as a view inside my current paradigm a little more than a year ago, after a substantial empirical investigation.[2]

V. Notes

[1] "The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy’s cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him. More than anything, you must be thinking of carrying your movement through to cutting him."

— Musashi in The Book of Five Rings, quoted by Eliezer

[2] Reflecting on this topic while writing this piece has led me to increase the weight I assign encouraging good intellectual practices on short time scales, e.g., the 5 second level.

73

28 comments, sorted by Highlighting new comments since Today at 9:51 AM
New Comment
I suspect this is one of the larger historical disagreements that I have had with various members of the rationality community. Right now when it comes to intellectual practice, I am most in favor of Levels 1-3 for beginners who are building functional scaffolds, and Levels 4-5 for intermediate level practitioners and beyond. The rationalist community and corpus seems to me to prefer Levels 1-3 much more for all practitioners.

Can you give a specific example of the way a person would act or think in some situation if they were prioritizing levels 1-3 vs. how they would act or think if they were prioritizing levels 4-5?

. . .


One thing I could imagine you to be saying is, "it is really useful to have a 'brainstorming / generation' mode, in which you try to come up with as many possible hypotheses as you can, and don't worry if they're false (or even incoherent)."

Or maybe you're saying "It is good and fine to adopt 'crazy hypotheses' for years at a time, because you'll get a lot of information that way, which ultimately helps you figure out what's true."

Or maybe, "It is a good idea to have some false beliefs, so long as they are the sort of false beliefs that great scientists typically have. This actually helps you get more relevant truths in the long run."

Or maybe (as a sort of extension of my second guess) you're saying "Individuals should 'specialize' in specific broad hypotheses. Instead of everyone having multiple models and frames, and trying to balance them, different people should 'hedgehog' on different models, each one adopting it really hard, and letting the epistemic process happen between the people, instead of within the people.

Does any of that match what you would recommend?



I like the overall framing, which goes from intervening on the minutiae to long-term, big-picture interventions, and correctly noting that optimising for truth at each level does not look the same and that such strategies can even be in conflict.

I want to think more concretely about what short-term and long-term interventions look like, so I'll try to categorise a bunch of recent ideas on LessWrong, by looking back at all the curated posts and picking ones I think I can fit into this system. I want to do this to see if I'm getting the right overall picture from Geoff's post, so I'm gonna do this in a pretty fast and loose way, and I assign about a 35% probability that a lot of these posts are severely misplaced.

I think there are two main axis here: one is the period of time over which you observe and then make the intervention, and the other is whether you're looking at an individual or a group. I'll start just with individuals.

I think that thought regulation evaluates whether particular thoughts are acceptable. This feels to me like the most rigorous type of analysis. Eliezer's Local Validity as Key to Sanity and Civilization is about making sure each step of reasoning follows from the previous, so that you don't wander into false conclusions from true premises. Abram's post Mistakes with conservation of expected evidence is an example of taking the basic rules of reasoning and showing when particular thoughts are improper. This isn't a broad heuristic, it's a law, and comes with a lot of rigour. These are posts about moving from thought A to thought B, and whether thought B is allowed given thought A.

If I frame train of thought regulation as being about taking short walks that aren't all definitely locally valid steps, but making sure that you end in a place that is true, I think this is often like 'wearing hats' or 'red teaming' or 'doing perspective taking', where you try out a frame of thinking that isn't your best guess for being true, but captures something you've not been thinking about, and ends up coming up with a concrete hypothesis to test or piece of evidence you've missed, that you still find valuable after you take the frame off.

Some examples of this include alkjash's Babbling then Pruning which is about generating many thoughts that don't meet your high standards then reducing it to only the good ones, and my recommendation to Hold On To The Curiosity which can involve saying statements that are not accurate according to your all-things-considered-view while you search for the thing you've noticed. Habryka's post Models of Moderation tries to put on a lot of different perspectives in short succession, none of which seem straightforwardly true to him but all of which capture some important aspect of the problem, for which the next step is finding solutions that score highly on lots of different perspectives at once. Also Scott's If It's Worth Doing, it's Worth Doing With Made-Up Statistics involves building a false-ish model that makes a true point, which has some similarity. It maybe also includes Jessicata's Writing Children's Picture Books which is a frame to think about a subject for a while.

A different post that naturally fits in here is Abram's Track-Back Meditation, where you just practice for noticing your trains-of-thought. Eliezer's writing on humility also covers making sure you check that your train of thought was actually accurate.

OP says the next level is about rules. If I think of it as basically being about trains of thought the plural rather than the individual, I'll say the next level is about multi-trains of thought regulation. I think a central example here would be Anna's "Flinching away from truth" is often about *protecting* the epistemology. This post feels like saying often you will have mildly broken trains of thought, and trying to fix them on the level of not letting yourself believe a single false thought or ever let a train of thought conclude in a false place, will be bad, because sometimes the reason you're doing that is to make sure the most important, big-picture thoughts are true. As long as you notice when you seem to be avoiding true thoughts, and look into what implicit buckets you're making, then you'll be able to make sure to think the important true thoughts and not break things in the meantime by trying to fix everything locally in a way that messes up the bigger picture.

I think Paul's post Argument, Intuition and Recursion also fits into this category. I'd need to read it again carefully to be sure, but I recall it primarily being about how to ensure you're moving in the true direction in the long-run if you often can't get the ground truth in reasonable amounts of time - if you cannot check whether each of your train of thoughts terminated in being actually true - and how to learn to trust alternative sources of information and ideas.

Plausibly much of Brienne writing about noticing (at her bog Agenty Duck) fits in here as well, which is about in the increasing your long-term ability to bring important parts of your experience into your trains of thought. It's not about any one train of thought ending right or wrong, but improving them more generally.

That said, this section was the hardest for me to find posts on (I feel like there's loads for the others), which is interesting, and perhaps suggests we're neglecting this facet of rationality on LessWrong.

Then we move onto individual holistic regression, which feels to me like it is about stepping into a very complex system, trying to understand it and recommend a high-level change to its trajectory. This isn't about getting particular thoughts or trains of thought right, it's just asking where the system is and how all the parts work. Kaj's post Building up to an Internal Family Systems model feels like it believes you'll never get perfect thoughts all of the time but that you can build a self-model that will help you notice the main culprits of bad outcomes and address those head-on from time to time. Ray's Strategies of Personal Growth works on this level too. Zvi's post Slack is about noticing whether you have the sort of environment that allows you the space to complete the important trains of thoughts, and if not that you should do something. There isn't currently a notion of perfect slack and there's no formula for it (yet), but it's a really useful high-level heuristic.

---

Looking at it this way, I notice the posts I listed started on the more rigorous end and then became less rigorous as I went along. I wonder if this suggests that when you understand something very deeply, you can simply label individual thoughts as good or bad, but when you have a much weaker grasp then you can only notice the pattern it with massive amounts of data, and even then only vaguely. I've often said that I'd like to see the notion of Slack formalised, and that I bet it would be really valuable, but for now we'll have to stick to Zvi's excellent poetry.

---

Anyhow Geoff; even though I'd guess you haven't read most of the linked posts, I'm curious to know your sense of whether the above is doing a good job of capturing what you think of as the main axis of levels-of-intervention for individuals, or not. I'm also interested to hear from others if they feel like they would've put posts in very different categories, or if they want to offer more examples I didn't include (of which there are many).

Plausibly much of Brienne writing about noticing (at her bog Agenty Duck) fits in here as well, which is about in the increasing your long-term ability to bring important parts of your experience into your trains of thought. It's not about any one train of thought ending right or wrong, but improving them more generally.

Huh, the thing I get out of Brienne's writing was actually "intervening on the level of direct thoughts", more than any other rationality technique. 'Noticing' is the fundamental building block of all "intervene on direct thought" techniques.

Just riffing a bit on the same project you started :)

There’s integrity and accountability -- integrity (Level 3) as following a certain decision theory and making it common knowledge that you do, such that others can reliably simulate you, and coordinate and make trades with you; and accountability as choosing who you want to do your individual holistic regulation (Level 4).

On another note, predictions and calibration training is often pitched as a kind of Level 1/2 intervention, but I’m more bullish on it as a Level 2 intervention with important Level 5 consequences.

It’s certainly often helpful to quantify your beliefs, and to form an all-things-considered opinion as an ensemble model of all the things you might trust. But to restrict your trains-of-thought to always follow an all-things-considered view, never veering off into resonating with a single model or world-view, is, as you point out, not that great. However, spreading the meme of being able to zoom out to an all-things-considered, quantitative opinion when necessary, and engaging with that level regularly enough to build a track-record of being able to do that, seems like a core part of having a healthy Bayesian community, even if you actually use it quite infrequently compared to other modes of thinking (just like professional mathematicians riff on a post-rigorous level but can drop down to the rigorous level when need be). This is part of my current framing for the forecasting class I’m teaching at CFAR mainlines.

There’s also a long list of other CFAR techniques one could analyse.

Eliezer’s and Abram’s posts are interesting Level 1 interventions, but look at lot like improvements to your slow, deliberate, conscious thinking processes, perhaps eventually becoming ingrained in your S1. I’d compare that with TAPs, which seem to intervene quite directly at Level 2 (and probably with backchaining effects to Level 1): “what thoughts do I want to follow from other thoughts?” [1]

This also seems to me to be the core of what makes CBT therapy work, whereby you uncover unwanted trains (“Get invite to social event” → “Visualise public shame from making an embarrassing comment” → “Flinch away from invite”), and then intervene to change their trajectory.

This causes the question of whether there are any more direct interventions at Level 1. Interventions determining which thoughts, in and of themselves, are even desirable or not. I interpret Selective reporting and Lines of retreat as analysing such interventions. The former (a bit extrapolated) as noting that if there are some unitary thoughts we cannot think, regardless of whether we actually believe them, this can cause large mistakes elsewhere in our belief system. The latter tries to tackle the problem when the blocker is motivational rather than social, by embedding the thoughts in conditionals and building a backup plan before considering whether it has to be used.

Then there's goal factoring, closely related to separation of concerns. Don't take actions which confusedly optimise for orthogonal goals, separate out your desires and optimize them separately. This probably has implications at Levels 1 through 4.

I could go on through the CFAR techniques and might at a later point, but that will do for now.

[1] This looks more like “epistemic TAPs”, or “internal TAPs”, which haven’t yet become a standard part of the mainline curriculum, where TAPs are often more external, and for things like “Deciding to take the stairs instead of the elevator as soon as I come into the office and look at them”.

Nitpick. Mildly triggered by:

These are posts about moving from thought A to thought B, and whether thought B is allowed given thought A.

“Allowed” is of course a very social term, and one that sounds a lot like “will my teacher accept it if I make this inference?”

Which is different from the mathematical mindset of what happens if I make that inference, and is that thing interesting/elegant/useful. What does it capture to have those kinds of inference rules, and does it capture the kind of process I want to run or not?

Moreover, when it comes to Bayesian reasoning and its various generalisations, the correct inference is _inevitable_, and not optional. There is one single credence which is correct to hold given your priors and the evidence you’ve observed. (Compare this to old school rationality, like Popper and Feynman, thought more in terms of you being “allowed” to hold a variety of beliefs as long as you hadn’t been refuted by experiment. I can’t find the reference post for this now, though.)

Agreement.

(The reason I framed it in the style of "am I allowed this thought" / "will my teacher accept it if I make this inference?” is because that's literally the frame used in the post ;P)

(I want to note that I'm quite interested in having a conversation about the above, both with Geoff but also with others who have thought a lot about rationality.)

There are a few factors which I imagine influence the optimal strategy criteria:

  • How much time do you have? If there's not a lot of time, more direct intervention methods (lower levels) seem to work better. If you have a lot of time, then it's probably okay to let people meander more as long as they eventually reach the low entropy. (Low entropy = behaving well consistently.)
  • How sticky is the low entropy? If the child notices that when it's behaving well things are going much greater for them, then probably they'll continue to stick with that behavior. But if the rewards are random, then they might be well behaved but then switch their behavior.
  • How much do you value the individuals? I.e. what's your utility for one well behaving kid vs one misbehaving one? I think in the rationalist community there's a tendency to value few very well behaving kids as being much better than a lot of somewhat well behaving kids. In that case, individual attention does seem more warranted / effective.
  • Your overall resources and expertise. If you had it all, why not do all of the levels at once? There's obviously something good to be said for all levels. But if you're not experienced in one of them, then you have to weigh the cost of getting better + making mistakes vs ignoring that level + focusing on others. And if your resources are limited, but expertise is even, you probably want to spread the resources around and focus on 80/20'ing each level.
  • The expertise brings up the point of: do you even know what "well behaving" is? To the extent you're not sure, you should probably focus on reducing uncertainty around that for yourself first. (Level 0)

At the end of the day, you either need to build robust gear level models that will help you make these decisions or have enough kids in your study that you could collect and analyze it statistically.

I think I'm willing to concede that there is something of an empirical question about what works best for truth-seeking, as much as that feels like a dangerous statement to acknowledge. Though seemingly true, it feels like it's something that people who try to get you commit bad epistemic moves like to raise [1]. I'm thinking here of post-rationalist lines of thought (though I can't claim overmuch familiarity with them) or the perennial debates over whether it's ever okay to deceive yourself. Whether or not they do so doesn't make it less true however.

Questions over how quickly to aim for consensus or how long to entertain new and strange ideas seem like very important questions. There've been recent debates about this kind of thing on LessWrong, particularly the entertaining as-yet-not-completely-justified-in-the-standard-frame things. It does seem like getting the correct balance between Babble vs Prune is an empirical question.

Allowing questions of motivation to factor into one's truth-seeking process feels most perilous to me, mostly as it seems too easy to claim one's motivation will be affected adversely to justify any desired behavior. I don't deny certain moves might destroy motivation, but it seems the risks of allowing such a fear to be a justification for changing behavior are much worse. Granted, that's an empirical claim I'm making.

[1] Or at least it feels that way, because it's so easy to assert that something is useful and therefore justified despite violating what seem like the correct rules. By insisting on usefulness, one can seemingly defend any belief or model. Crystals, astrology, who knows what. Though maybe I merely react poorly at what seem like heresies.


One important question is:

Say we're in the inconvenient world where it's important to have lots of babble, yet it is also the case that lots of babble is dangerous for the epistemic health of both groups and individuals (i.e. the strategy with the highest expected payoff is "try lots of questionable thinking, some of which outputs the most important stuff, but most of which is useless or harmful)...

...what do you do, if you want to succeed as an individual or a group?

(I don't have a good answer right now and will be thinking about it. I have some sense that there are norms that are reasonable for "how to flag things with the right epistemic status, and how much to communicate publicly", which might navigate the tradeoff reasonably)

I currently think we are in a world where a lot of discussion of near-guesses, mildly informed conjectures, probably-wrong speculation, and so forth is extremely helpful, at least in contexts where one is trying to discover new truths.

My primary solution to this has been (1) epistemic tagging, including coarse-grained/qualitative tags, plus (2) a study of what the different tags actually amount to empirically. So person X can say something and tag it as "probably wrong, just an idea", and you can know that when person X uses that tag, the idea is, e.g., usually correct or usually very illuminating. Then over time you can try to get people to sync up on the use of tags and an understanding of what the tags mean.

In cases where it looks like people irrationally update on a proposition, even with appropriate tags, it might be better to not discuss that proposition (or discuss in a smaller, safer group) until it has achieved adequately good epistemic status.

I actually disagree that that lots of babble is necessary. One of the original motivations for Mazes and Crayon was to show, in an algorithmic context, what some less babble-based strategies might look like.

My own intuition on the matter comes largely from hard math problems. Outside of intro classes, if you sit down to write a proof without a pre-existing intuitive understanding of why it works, you'll math-babble without getting any closer to a proof. I've spent weeks at a time babbling math, many times, with nothing to show for it. It reliably does not work on hard problems.

Something like babbling is still necessary to build intuitions, of course, but even there it's less like random branching and more like A* search.

I was not making a claim about how much babble is necessary – just noting if it were necessary we'd want a good way to handle that fact. (My primary motivation here was a worry that people might contrast "high babble == low epistemic standards" and stop there, and I wanted to make sure the conversation had a proper line of retreat)

That said – I think I might have been using babble as shorthand for a different concept than you were thinking (and I do obviously suspect the concept I do mean is at least plausibly important enough to be entertaining this line of thought)

There's a type of thinking I (now) call "GPT2 style thinking", where I'm just sort of pattern matching nearby thoughts based on "what sort of things I tend to think/say in this situation", without much reflection. I sometimes try to use this while programming and it's a terrible idea.

Was that the sort of thing you were thinking? (If so that makes sense, but that's not what I meant)

The thing I'm thinking of is... not necessarily more intentional, but a specific type of brainstorming. It's more for exploring new ideas, and combining ideas together, and following hunches about things being important. (this might not be the best use of the term "babble" and if so apologies)

I was not making a claim about how much babble is necessary – just noting if it were necessary we'd want a good way to handle that fact.

Ah yeah, makes sense on a second read.

The thing I'm thinking of is... not necessarily more intentional, but a specific type of brainstorming.

Now I'm curious, but not yet sure what you mean. Could you give an example or two?

I think I'm willing to concede that there is something of an empirical question about what works best for truth-seeking, as much as that feels like a dangerous statement to acknowledge. Though seemingly true, it feels like it's something that people who try to get you commit bad epistemic moves like to raise [1].

There's a tricky balance to maintain here. On one hand, we don't want to commit bad epistemic moves. On the other hand, failing to acknowledge the empirical basis of something when the evidence of its being empirical is presented is itself a bad epistemic move.

With epistemic dangers, I think there is a choice between "confront" and "evade". Both are dangerous. Confronting the danger might harm you epistemically, and is frequently the wrong idea — like "confronting" radiation. But evading the danger might harm you epistemically, and is also frequently wrong — like "evading" a treatable illness. Ultimately, whether to confront or evade is an empirical question.

Allowing questions of motivation to factor into one's truth-seeking process feels most perilous to me, mostly as it seems too easy to claim one's motivation will be affected adversely to justify any desired behavior. I don't deny certain moves might destroy motivation, but it seems the risks of allowing such a fear to be a justification for changing behavior are much worse. Granted, that's an empirical claim I'm making.

One good test here might be: Is a person willing to take hits to their morale for the sake of acquiring the truth? If a person is unwilling to take hits to their morale, they are unlikely to be wisely managing their morale and epistemics, and instead trading off too hard against their epistemics. Another good test might be: If the person avoids useful behavior X in order to maintain their motivation, do they have a plan to get to a state where they won't have to avoid behavior X forever? If not, that might be a cause for concern.

There's a tricky balance to maintain here.

Very much so.

With epistemic dangers, I think there is a choice between "confront" and "evade".

Not a bid for further explanation, just flagging that I'm not sure what you actually mean by this, as in which concrete moves which correspond to each.

If a person is unwilling to take hits to their morale, they are unlikely to be wisely managing their morale and epistemics, and instead trading off too hard against their epistemics.

To me the empirical question is whether a person ought to be willing to take all possible hits to their morale for the sake of their epistemics. I have a consequentialist fear—and I think consequentialist means we're necessarily talking empiricism—that any exceptions/compromizes may be catastrophic.

. . .

It's possible there's a kind of meta-debate here going on, with some people (including me) sometimes having underlying consequentialist/empirical beliefs that even engaging in consequentialist/empirical arguments about trading off against epistemics would have overall bad consequences and/or an empirical belief that anyone who would offer such arguments readily must not really care about epistemics because they're not [naively] treating them as sacred enough [1].

I hadn't formulated this in that way before, so I'm glad this post/discussion has helped me realize that arguably "it's consequentialism/empiricism all the way up", even if you ultimately claim that your epistemological consequentialism cashes out to some inviolable deontological rules.


[1] Not treating them as sacred enough therefore they don't really care, therefore can't be trusted - this is my instinctive reaction when encounter, say, post-rationalist arguments about needing to consider what's useful, not just what's true. Maybe it's not always fair.

. . .

I had a revealing exchange with someone a few months ago about conversation norms on LessWrong. I was stating the necessity of considering the consequences of your speech and how that should factor into how one speaks. In course of that debate, they said [paraphrasing]:

"You' re trying to get me to admit that I sometimes trade off things against truth, and once I've admitted that, we're just "haggling over price". Except, no.

I think this response was a mistake, not in least because their rigidity meant we couldn't discuss different consequences of different policies or even what tradeoffs I thought I was making (fewer than they did). That discussion felt different than this post because it was mostly about what you say to others and how, but I see the analogy to even when you're considering how people individually think.

So, I may maintain my suspicions, but I won't say "except, no."

arguments about needing to consider what's useful, not just what's true.

Absent examples it's not clear how these trade off against each other. It seems like what's useful is a subset of what's true - offhandedly, I don't know what color of flames are produced if cesium is burned (or what cesium is, if it burns, if the fumes would be harmful, etc.), but if I thought that might be useful knowledge in the future I'd seek it out.

It might be that the right allocation of one’s error identification resources is 90% to identifying biases and fixing System 2 and 10% to overcoming deep psychological distortions in System 1. Or it might be 10% and 90%.

This seems like an important question, but it seems mostly orthogonal to the 5 levels you outline, which seem to be mostly a matter of the timescale on which one intervenes (or how long you wait to see the results of a process before you judge it as good or bad, epistemic or non-epistemic.)

Maybe I'm missing something, but it seems like you could try to correct conscious S1 processes, with feedback on the scale of years, or on the scale of seconds. Likewise, you could try and correct unconscious S2 processes, with feedback on the scale of years, or on the scale of seconds.

Good point. I think they are prima facie orthogonal. Empirically, though, my current take is that many deep psychological distortions affect attention in a way that makes trying to manage them primarily on short time scales extremely difficult compared to managing them on longer time scales.

Imagine, for instance, that you have underlying resignation that causes your S1 to put 5x the search power into generating plausible failure scenarios than plausible success scenarios. This might be really hard to detect on the 5 second level, especially if you don't have a good estimate of the actual prevalence of plausible failure or success scenarios (or, a good estimate of the actual prevalence of plausible failure or success scenarios, as accessible by your own style of thinking). But on longer time scales, you can see yourself potentially bending too pessimistic and start to investigate why. That might then turn up the resignation.

As we’re thinking about _intervention_, we’re hoping to _change_ something, or accomplish some _effect_. And in this vein, it’s interesting to note how the levels aren’t that independent.

For example, incentives tend to backpropagate from one level to the other. I expect that if you regularly give someone negative reinforcement for expressing half-formed ideas (Level 3 intervention), they might not just stop expressing ideas, but also stop _having_ original ideas altogether (Level 1 / 2 effect).

Or if you establish a meme of sharing the causes of your beliefs (Level 3 intervention), your community as a whole will run into fewer info-cascades (Level 5 effect).

Some of the most powerful interventions are those which create loops between levels. Helping people become stronger rationalists (Level 1 / 2) will enable them to make important changes to their and their community’s environment (Level 4 / 5) which will then feedback into their ability to think true thoughts and enact further important changes.

Similarly, bad equilibria emerge when Level 5 interventions change the optimal strategy at Level 3, and people doubling down on that then further entrenches those Level 5 changes.

I liked this post, but found the beginning confusing because levels 1 and 2 sounded impossible as described: "You could intervene at the level of the child’s individual thoughts" - I don't have access to other people's thoughts so I can't intervene at this level; it is only at level 3 where I start getting direct information about the phenomenon that I'm supposed to modify. So I was confused about whether I should think about this as a real-world analogy or as some purely philosophical thought experiment where I have mind-reading powers.

The second section helped clear up my confusion since it explained that intervening on the first level means trying to get the children to police their own thoughts, rather than me directly intervening on them.


Imagine that you have a classroom of children that you want to behave well... You could intervene at the level of the child’s individual thoughts. Police each thought, make sure it is a well-behaved-child thought.

I want to briefly point to a different relevant axis. Your framing is primarily about policing bad thoughts, and generally making the group stable and well-behaved. If I had a group of 30 people to command, while there are some ways I'd try to satisfice for every person (e.g. make sure they all learn a certain level of math, all have a certain ceiling on the trauma they experience in the year) I actually will put a lot of effort (perhaps >50% of my focus) into children achieving the biggest wins possible (e.g. getting one child to a state where they are deeply curious about some aspect of the world and are spending a lot of self-directed effort to better understand that phenomena, or two children getting very excited about building something and spending most of their time doing that well). The motivation here is that a single child growing up and making breakthrough discoveries in fundamental physics is something I will trade-off against a lot of days of many 'well-behaved' children.

But this is an abstract point, and it's easy for people to talk past one another or create a double illusion of transparency when talking in ungrounded abstractions, so I'll write another, much more concrete, comment.

I also want to mention, as Geoff indicates in the OP, that once you start looking on the time scale of months and years, I think motivation becomes an obvious factor. One way you can think of it is that you have to ask not merely whether this epistemic heuristic a good fit for a person's environment, but also ask how likely the person is to consistently using the heuristic when it's appropriate. Heuristics with a high effort-to-information ratio often wear a person out and they use them less and less.

It is obvious that one can make similar levels and ask a similar question about rationality and the pursuit of the truth. What should we be trying to optimize in order to optimize the intellectual performance of a community?

This presupposes that optimizing the intellectual performance of a community is the goal in the first place. Individuals have a great deal more control over their own thoughts/behavior than over community norms; there is little point attempting to optimize something over which one exercises minimal control.

x

[This comment is no longer endorsed by its author]Reply
Do you think that human-control is conserved in some sense, i.e. some humans are controlling community practices, even if you're not?

I think of people today trying to control community practices as a lot like premodern physicians. On rare occasions they accidentally stumble on something that works sometimes, and maybe even get it to work more than once. But the understanding to consistently predict which interventions will have which effects just isn't there, and the vast majority of interventions either have zero effect or negative effects. It's all humors and leeches.

Someday, we will be better at this. Personal to Prison Gangs is the best example I have - it's a step closer to the understanding required to go from "I want to implement change X in this community" to "I personally can reliably make that change happen by doing Y". But we are not yet anywhere near that point.

Meanwhile, in the absence of actually understanding the effects of our actions on community dynamics, the best we can do is try stuff and see what happens. Given that most changes are zero or negative (due to generalized efficient markets), this only works when we have a platform for rapidly testing many changes and quantifying their impacts on the community - video game communities are a good example. In that case, there are clearly people who can modify the community by modifying the interface - assuming they bother to do so. (This does not currently apply to e.g. the LessWrong team, since last I heard they didn't have the tracking and a/b testing framework necessary to find out which changes have which effects. To the extent that they're trying to control community dynamics, they're still on humors and leeches.)

x

[This comment is no longer endorsed by its author]Reply

The dynamics in a small group are qualitatively different from whole communities. To a large extent, that's exactly why community control is hard/interesting. Again, Personal to Prison Gangs is a good example.