Comment on "Endogenous Epistemic Factionalization"

by Zack_M_Davis6 min read20th May 20207 comments

128

DisagreementCarving / Clustering RealityRationality
Curated

In "Endogenous Epistemic Factionalization" (due in a forthcoming issue of the philosophy-of-science journal Synthese), James Owen Weatherall and Cailin O'Connor propose a possible answer to the question of why people form factions that disagree on multiple subjects.

The existence of persistent disagreements is already kind of a puzzle from a Bayesian perspective. There's only one reality. If everyone is honestly trying to get the right answer and we can all talk to each other, then we should converge on the right answer (or an answer that is less wrong given the evidence we have). The fact that we can't do it is, or should be, an embarrassment to our species. And the existence of correlated persistent disagreements—when not only do I say "top" when you say "bottom" even after we've gone over all the arguments for whether it is in fact the case that top or bottom, but furthermore, the fact that I said "top" lets you predict that I'll probably say "cold" rather than "hot" even before we go over the arguments for that, is an atrocity. (Not hyperbole. Thousands of people are dying horrible suffocation deaths because we can't figure out the optimal response to a new kind of coronavirus.)

Correlations between beliefs are often attributed to ideology or tribalism: if I believe that Markets Are the Answer, I'm likely to propose Market-based solutions to all sorts of seemingly-unrelated social problems, and if I'm loyal to the Green tribe, I'm likely to selectively censor my thoughts in order to fit the Green party line. But ideology can't explain correlated disagreements on unrelated topics that the content of the ideology is silent on, and tribalism can't explain correlated disagreements on narrow, technical topics that aren't tribal shibboleths.

In this paper, Weatherall and O'Connor exhibit a toy model that proposes a simple mechanism that can explain correlated disagreement: if agents disbelieve in evidence presented by those with sufficiently dissimilar beliefs, factions emerge, even though everyone is honestly reporting their observations and updating on what they are told (to the extent that they believe it). The paper didn't seem to provide source code for the simulations it describes, so I followed along in Python. (Replication!)

In each round of the model, our little Bayesian agents choose between repeatedly performing one of two actions, A or B, that can "succeed" or "fail." A is a fair coin: it succeeds exactly half the time. As far as our agents know, B is either slightly better or slightly worse: the per-action probability of success is either 0.5 + ɛ or 0.5 − ɛ, for some ɛ (a parameter to the simulation). But secretly, we the simulation authors know that B is better.

import random

ε = 0.01

def b():
    return random.random() < 0.5 + ε

The agents start out with a uniformly random probability that B is better. The ones who currently believe that A is better, repeatedly do A (and don't learn anything, because they already know that A is exactly a coinflip). The ones who currently believe that B is better, repeatedly do B, but keep track of and publish their results in order to help everyone figure out whether B is slightly better or slightly worse than a coinflip.

class Agent:
    ...

    def experiment(self):
        results = [b() for _ in range(self.trial_count)]
        return results

If represents the hypothesis that B is better than A, and represents the hypothesis that B is worse, then Bayes's theorem says

where E is the record of how many successes we got in how many times we tried action B. The likelihoods and can be calculated from the probability mass function of the binomial distribution, so the agents have all the information they need to update their beliefs based on experiments with B.

from math import factorial

def binomial(p, n, k):
    return (
        factorial(n) / (factorial(k) * factorial(n - k)) *
        p**k * (1 - p)**(n - k)
    )

class Agent:
    ...

    def pure_update(self, credence, hits, trials):
        raw_posterior_good = binomial(0.5 + ε, trials, hits) * credence
        raw_posterior_bad = binomial(0.5 - ε, trials, hits) * (1 - credence)
        normalizing_factor = raw_posterior_good + raw_posterior_bad
        return raw_posterior_good / normalizing_factor

Except in order to study the emergence of clustering among multiple beliefs, we should actually have our agents face multiple "A or B" dilemmas, representing beliefs about unrelated questions. (In each case, B will again be better, but the agents don't start out knowing that.) I chose three questions/beliefs, because that's all I can fit in a pretty 3D scatterplot.

If all the agents update on the experimental results published by the agents who do B, they quickly learn that B is better for all three questions. If we make a pretty 3D scatterplot where each dimension represents the probability that B is better for one of the dilemmas, then the points converge over time to the [1.0, 1.0, 1.0] "corner of Truth", even though they started out uniformly distributed all over the space.

But suppose the agents don't trust each other's reports. ("Sure, she says she performed 50 times and observed 26 successes, but she also believes that is better than , which is crazy. Are we sure she didn't just make up those 50 trials of ?") Specifically, our agents assign a probability that a report is made-up (and therefore should not be updated on) in proportion to their distance from the reporter in our three-dimensional beliefspace, and a "mistrust factor" (a parameter to the simulation).

from math import sqrt

def euclidean_distance(v, w):
    return sqrt(sum((v[i] - w[i]) ** 2 for i in range(len(v))))

class Agent:
    ...

    def discount_factor(self, reporter_credences):
        return min(
            1, self.mistrust * euclidean_distance(self.credences, reporter_credences)
        )

    def update(self, question, hits, trials, reporter_credences):
        discount = self.discount_factor(reporter_credences)
        posterior = self.pure_update(self.credences[question], hits, trials)
        self.credences[question] = (
            discount * self.credences[question] + (1 - discount) * posterior
        )

(Um, the paper itself actually uses a slightly more complicated mistrust calculation that also takes into account the agent's prior probability of the evidence, but I didn't quite understand the motivation for that, so I'm going with my version. I don't think the grand moral is affected.)

Then we can simulate what happens if the distrustful agents do many rounds of experiments and talk to each other—

def summarize_experiment(results):
    return (len([r for r in results if r]), len(results))

def simulation(
    agent_count,  # number of agents
    question_count,  # number of questions
    round_count,  # number of rounds
    trial_count,  # number of trials per round
    mistrust,  # mistrust factor
):
    agents = [
        Agent(
            [random.random() for _ in range(question_count)],
            trial_count=trial_count,
            mistrust=mistrust,
        )
        for i in range(agent_count)
    ]

    for _ in range(round_count):
        for question in range(question_count):
            experiments = []
            for agent in agents:
                if agent.credences[question] >= 0.5:
                    experiments.append(
                        (summarize_experiment(agent.experiment()), agent.credences)
                    )
            for agent in agents:
                for experiment, reporter_credences in experiments:
                    hits, trials = experiment
                    agent.update(
                        question,
                        hits,
                        trials,
                        reporter_credences,
                    )

    return agents

Depending on the exact parameters, we're likely to get a result that "looks like" this agent_count=200, round_count=20, question_count=3, trial_count=50, mistrust=2 run—

Some of the agents (depicted in red) have successfully converged on the corner of Truth, but the others have polarized into factions that are all wrong about something. (The colors in the pretty 3D scatterplot are a k-means clustering for k := 8.) On average, evidence pushes our agents towards Truth—note the linearity of the blue and purple points, illustrating convergence on two out of the three problems—but agents who erroneously believe that A is better (due to some combination of a bad initial credence and unlucky experimental results that failed to reveal B's ε "edge" in the sample size allotted) can end up too far away to trust those who are gathering evidence for, and correctly converging on, the superiority of B.

Our authors wrap up:

[T]his result is especially notable because there is something reasonable about ignoring evidence generated by those you do not trust—particularly if you do not trust them on account of their past epistemic failures. It would be irresponsible for scientists to update on evidence produced by known quacks. And furthermore, there is something reasonable about deciding who is trustworthy by looking at their beliefs. From my point of view, someone who has regularly come to hold beliefs that diverge from mine looks like an unreliable source of information. In other words, the updating strategy used by our agents is defensible. But, when used on the community level, it seriously undermines the accuracy of beliefs.

I think the moral here is slightly off. The specific something reasonable about ignoring evidence generated by those you do not trust on account of their beliefs, is the assumption that those who have beliefs you disagree with are following a process that produces systematically misleading evidence. In this model, that assumption is just wrong. The problem isn't that the updating strategy used by our agents is individually "defensible" (what does that mean?) but produces inaccuracy "when used on the community level" (what does that mean?); the problem is that you get the wrong answer if your degree of trust doesn't match agents' actual trustworthiness. Still, it's enlighteningly disturbing to see specifically how the "distrust those who disagree" heuristic descends into the madness of factions.

(Full source code.)

128

7 comments, sorted by Highlighting new comments since Today at 12:37 PM
New Comment

I wonder if this would still happen if say, 1 in 1000 agents will randomly lie about their evidence (always in the same direction), and all agents start with the correct prior on trustworthiness and do the correct update when they disagree. I'd guess that there's some threshold percentage of untrustworthy agents above which you get factions, and below get convergence.

Looking at the picture of the factions, it looks like you can tell (fairly well at least) which corner is correct from the global structure. Maybe there's a some technique you can use in a more general situation to determine what what the correct combination of claims is based on what sort of factions (incl the most likely internal disagreements) are organized around them.

Relatedly, in the scenario (in some utterly absurd counterfactual world entirely unlike the real world) where agents sometimes misrepresent the evidence in a direction that favours their actual beliefs, it seems like the policy described here might well do better than the policy of updating fully on all evidence you're presented with.

Given the limitations of our ability to investigate others' honesty, it's possible that the only options are factionalism or naivety and that the latter produces worse results than factionalism; e.g., if we happen to start with more people favouring (A,A,A) than (B,B,B) then rejecting "distrust those who disagree" may end up with everyone in the (A,A,A) corner, which is probably worse than having factions in all eight corners if the reality is (B,B,B).

As Zack says, what we want is a degree of trust that matches agents' trustworthiness. But that may be extremely hard to obtain, and if all agents are somewhat untrustworthy (but some happen to be right so that their untrustworthiness does little harm) then having trust matching trustworthiness may produce exactly the sort of factional results reported here.

So I think the most interesting question is: Are there strategies that, even when agents may be untrustworthy in their reporting of the evidence, manage to converge to the truth over a reasonable portion of the space of untrustworthiness and actual-evidence-strength? My guesses: (1) yes, there kinda are, and the price they pay instead of factionalism is slowness; (2) if there is enough untrustworthiness relative to the actual strength of evidence then no strategy will give good results; (3) there are plenty of questions in the real world for which #2 holds. But I'm not terribly confident about any of those guesses.

Curated.

I like that this post took a very messy, complicated subject, and picked one facet of to gain a really crisp understanding of. (MIRI's 2018 Research Direction update goes into some thoughts on why you might want to become deconfused on a subject, and the Rocket Alignment Problem is a somewhat more narrativized version)

I personally suspect that the principles Zack points to here aren't the primary principles at play for why epistemic factions form. But, it is interesting to explore that even when you strip away tons of messy-human-specific-cognition (i.e. propensity for tribal loyalty for ingroup protection reasons), a very simple model of purely epistemic agents may still form factions.

I also really liked Zack lays out his reasoning very clearly, with coding steps that you can follow along with. I should admit that I haven't followed along all the way through (I got about a third of the way through before realizing I'd need to set aside more time to really process it). So, this curation is not an endorsement that all his coding checks out. The bar for Curated is, unfortunately, not the bar for Peer Review. (But! Later on when we get to the 2020 LessWrong Review, I'd want this sort of thing checked more thoroughly).

It is still relatively uncommon on LessWrong for someone to even rise to the bar of "clearly lays out their reasoning in a very checkable way", and when someone does that while making a point that seems interesting and important-if-true, it seems good to curate it.

Thank you for sharing your source code! I had fun playing around with it. I decided to see what happened when the agents were estimating B's bias, rather than just if its expectation was higher than A. I started them with a Beta prior, cos it's easy to update.

I found (to my surprise) that when only agents that think B is good try it (as in the setup of the post), we still only get estimates equal to and below 0.5 + ε! This makes sense on reflection: if you look for data when you're one in one direction, you won't end up wrong that way any more (interesting that the factionalization wasn't strong enough to hold people back here; I wonder if this would have been different if the experiments summarised the agent's updated beliefs, rather than original beliefs)

Trying to fix this, I decided to think about agents that were trying to establish that B was clearly better or clearly worse than A. One attempt at this was only testing if B seemed about as good as A in expectation. This led to a clear cross pointing at the true value. Another attempt was only testing if the variance in the distribution over B's goodness was high. This was very sensitive to the chosen parameters.

Early prior differences based both on genetics and early reinforcement causes low level systems to prioritize sampling over certain domains, time windows, etc. This compounds into paying attention to significantly different aspects of sensory and would still lead to disagreements even if the stimuli subsequently encountered were the same. So most of the polarization is already baked in by the time we get to levels of processing happening within conscious awareness.

Also: agents manipulate access to some forms of evidence in order to collect rents on the information asymmetry.

Thanks for the post! I think the ideas here are pretty cool.

However there are some problems with the experiment. 1. The update rule is not bayesian in the sense that if you permute the order of the observations of the experiments, you arrive at different conclusions for each agent (You can test this experimentally easily). This shouldn't happen in a bayesian update. I didn't read the original paper but maybe this is what they meant by the prior of the observations. (ie you probably need a conjugate prior on the belief, which in this case is a beta distribution. But then I'm not sure what the proper way to model the mistrust factor is.)

2. We allow mistrust factor to be exactly 1, which means that the agent ignores the evidence from that reporter completely. This may or may not be appropriate... I can see both sides of the argument. But if we constrain the mistrust factor to be at most 1-eps, then with enough observations everyone will eventually converge to the correct beliefs.

Note that another way to correct of your own mistaken belief even if you distrust everyone is to do the experiments yourself (You have to trust yourself at least?). It's okay to not directly adjust your belief (ie: action A is better) based on evidence from people you don't like but it still makes sense to let the evidence lower the degree of certainty in your own beliefs (ie: I'm 60% certain that action A is better). Decision making should take into account of this degree of uncertainty. If all agents who currently believe that action A is better give action B a try once in a while, the entire community probably converges a lot faster too.