Claims all the way down

Jasper Blank

It can be hard to know where to begin when you do not understand something. A way to try to understand things is to look at what the people who claim to understand something are talking about.
Sadly this means you have to deal with massive discussions. A big example of this is the Covid origin debates. During these discussions the disagreement can be about many parts, and it can be hard to know who is even telling the truth and who is lying. This can make it almost impossible to map out what the world is really like and to see why.

Almost, but not quite...

If we want to map out these discussions we have to start with the core of what makes an honest argument. At the core there are primary sources. Primary sources can be a specific study, a witness claim or a verified authority to name a few. These primary sources can then be linked to claims. If we find all of the relevant primary sources and all of the claims that are supported by them we can calculate how valid each of these claims is using methods explained later in this article.

Sometimes, however, a claim is so complicated that there are many different primary sources pointing in many different directions. In these cases it can be helpful to break the claim down into subclaims. Each of these subclaims can then in turn be supported by primary sources or subclaims. As long as the logic connecting every claim with subclaims and sources is valid, it will allow you to find the best possible conclusion based on the available evidence.

Finding the strength of any piece of evidence on any claim used to be painstakingly slow and difficult to calibrate. This is where language models come in. They can do the arduous work of scraping for every source and identifying how relevant it is and how strongly it weighs on each specific claim. This can quickly fill out an entire graph of claims. This graph of claims can then be made into a publicly available tool.

These calls will still be subjective which is why it is essential for the tool to be transparent and easy to add your own perspective to. People are going to disagree with the final outcome of this process no matter what claim it ends up supporting. This disagreement is why we wanted this tool in the first place. That is why it is essential to keep every factor accessible and able to be called into question. A proper version of this tool should be able to quickly show the effects of any change to any link on the final claim.

Once this tool is in place you will be able to drill down on any part of the claim tree and find why every part of the argument is as strong as it is. By the end of this article I will present one component I believe any version of this tool would need. This component is called the grouping node and allows a single node to combine the evidence present in multiple sources or subclaims into a single probability a relevant margin for error.

How claims should be combined

In starting work on this tool I wrote down some core principles to keep this tool accountable to.

Every claim should be traceable to primary sources
Every number that is not set in stone should be shown as such
The system should be clear and understandable
The arithmetic should be based on existing literature
The system should have consistent reasoning on reruns
The system should be able to capture any argument

To show what this system could look like when filled out I wrote an example graph that shows how a claim can be supported by subclaims and how each of these claims can be supported by subclaims and sources in turn.

At the top you see the main claim. This main claim is the one we want to know with appropriate certainty. You can see that this claim has two subclaims, in this case a supporting and a refuting subclaim. The claim takes into account both of these subclaims when coming to a final value. Each of these subclaims have their own inputs in turn. The beauty is that this can extend down as far as needed to represent any argument.

This graph is only illustrative. All of the values in this first widget are there to show how the information propagates. If you want to know how real sources get put in then keep on reading until the second widget.

This graph is fully interactive. I encourage you to try clicking on every part. It can be especially fun to click on a source and change the value and see how every upstream claim adjusts based on it.

This graph uses a simple formula, we will walk through this formula in the case of the main claim in its default values. First we need to convert the percentages into odds. We have two subclaims the first subclaim has 86% certainty and the second subclaim has 38%.

Then we need to know the association between the two sources. If they are independent we should treat them as separate tests and multiply the odds. If they are not we need to average them by taking the square root after multiplying this property holds in this general formula.

In our top claim case we have two independent claims so a is zero and
Then we can feel in our odds into the formula.

This gives us the final odds of the final claim and we can convert those back into percentages.

This same process gets propagated throughout the entire graph allowing for the claim to be supported by every piece of knowledge below it. If you're interested why this formula was chosen I invite you to follow along with the math on the block below. The article is intended to be possible to follow even if you didn't read that part.

It is important to note that this way of combining odds does assume that each subclaim and source moves their parent claim by exactly the same force as how likely they are to be true. This is a simplification made to allow for this example to be easier to follow along with. In the final version every node will separate the confidence in the claim from the force of each subclaim on their parent.

Following along with the math

In the specific case above we showed how to combine two odds. I will start off with showing how this formula generalizes. First I show the two cases used in the widget for 2 and 3 claims.

If you're observant you might have noticed the pattern already. This pattern can be extend to allow a claim to have any number of subclaims.

This formula might feel pulled out of thin air. To show where It comes from I will go back to the beginning.

An introduction to Bayes

This article will be using a lot of the terminology of Bayesian statistics. If you have never seen Bayesian statistics before or want to catch up, I can recommend this excellent series from 3Blue1Brown. If instead you want a small reminder I will try to build up to it from fundamentals.

In these equations P(X) is intended to mean the probability of X. So if I toss a coin "P(heads) = 50%"
translates to "The probability that I toss heads is 50%".

In these equations a | is intended to signal a "given that". So "P(Heads|cheating) = 100%" translates to "The probability that I toss heads given that I am cheating is 100%".

These definitions together allow us to build up to out first equation:

This equations shows that the probability of A and B being true can be restated as the probability of B being true multiplied by the probability of A given that B. It can also be restated as the probability of A multiplied by the probability of B given A. Below you can see a visual proof where the green area represents this constant area.

Below instead of A and B we will use H and E. H represents the Hypothesis and E represents the evidence. So in this case P(H|E) represents the probability of the Hypothesis given the evidence. P(E|H) represents the probability of the evidence given the hypothesis.

Once we have this formula we can construct the core Bayes formula by dividing both sides by P(E).

This is the core Bayesian formula. It allows us to calculate what our hypothesis H should be given the evidence E. The only problem with this formula is that it can not easily integrate multiple pieces of evidence. For that we will need to do a slight rewrite.

We can go through the same reasoning for P(H|E). For this we use symbol to mean not. P(H) is the probability that H is not true. This gives us an almost identical equation.

We can divide the above two formula's. Once we have done this we can simplify away the P(E)

If this is all going a bit fast I can strongly recommend this 3Blue1Brown video. Once we have done this change we can cleanly separate this formula into three parts: The posterior odds, the likelihood ratio and the prior odds.

Finally we can simplify the combination of the percentage of something happening divided by the probability of the opposite as the odds. For example 10% can be expressed as 10 to 90 odds and 50% can be expressed as 1 to 1 odds. For the mathematics of expectations odds can more easily represent changes in belief than probability, shown by the examples below.

Below I will show how to use this with three examples. Every time I will normalize the odds to a total of 100 allowing quick conversion to percentages, in the real program this is calculated in odds allowing for quick and accurate measurement.

Weather forecasting: Tomorrow I am going to go camping, Id live to know if its going to rain. In my country the prior odds of it raining are 20 to 80, or 20%.
I look at the weather forecast, their forecasts have a likelihood ratio of 90 to 10, or 90%. That means that the weather forecast is correct 90 times for every 10 times it is wrong.
The way to calculate my posterior expectation of having rain tomorrow is to multiply 20/80 with 90/10 or (20/80)*(90/10) = 1800/800 ≈ 69/31 or 69%.
This means that after checking my weather app I expect a 69 to 31 odds of rain tomorrow.
Disease detection: I go to the doctor for a regular routine checkup. In my age bracket the prior odds of having heart problems is 1 to 99, or 1%.
I undergo a test that has a likelihood ratio of 95 to 5, or 95%. This means that the test is correct 95 times for every 5 times that it is wrong.
The way to calculate my posterior expectation of having heart disease after this test is simply to multiply 1/99 with 95/5 or (1/99)*(95/5) = 95/495 ≈ 16/84 or 16%.
This means that after this test I expect a 16 to 84 odds of having heart problems. If it surprises you that this test still means I most likely don't have heart problems then please again watch this video.
Disease detection part 2: I go back to the doctor because a screening test showed that I might have heart problems . My cohort with one positive screening test show an odds of having 16 to 84 odds of having heart problems.
Next I undergo a really strong test that has a likelihood of 99 to 1. this means it is correct 99 times for every time it is wrong.
Then the posterior expectation is (16/84)*(99/1) = (1584/84) ≈ 95/5 or 95%. This shows that the two tests together are able to be strong enough to overcome the initial low likelihood.

Separating out the prior and the likelihood ratio like this allows us to multiply together many tests. If we take the same 2 hart problem tests of above we could combine them into a single stronger test. We can do this by multiplying the tests giving us a combined likelihood ratio of (95/5) * (99/1) = 9405/5^[1].

To show that this gives the same result we can use this test on the original prior of 1/99 again by multiplying (9405/5)*(1/99) = 95/5 or 95%.

With this in our toolbelt we are now able to add together any amount of uncorrelated updates to our hypothesis. However in the real world we find many pieces of evidence that are correlated. We would still like to be able to use these pieces of evidence.

Opinion pooling

In the extreme fully correlated evidence points at the same claim. One example of this is measuring temperature in the same room multiple times, in this case we just want to average out the measurements.

If we have two experts on Weather forecasting and we ask both of them if next week there will be a hurricane hitting the coast they will most likely give two separate odds. Lets look at one scenario.

The fist expert gives 99:1 odds of there being a hurricane and the other expert gives 50:50 odds of there being a hurricane. We want to add together their claims, but linearly adding the claim together would fail to take into account the extra confidence of the 99:1 odds expert. The middle ground is multiplicative averaging.

This can be generalized to any combination of two odds.

Here we can also give every expert a different weight the important part is that the total weight adds to 1. So if we give expert 1 a weight of 0.1 we need to give expert 2 a weight of 0.9.

We can generalize this to any amount of experts. If you're not familiair with Π and Σ. I will explain one by one first Σ essentially says sum up, so we sum up every weight unil the final weight and we want it to sum to 1. This sum of 1 is to make sure that the percentage is bounded by the claims of the experts. We do not want to claim a higher certainty than the most certain expert. The second symbol Π says to take the product. So we multiply together every odds ratio O to the power of that experts weight, just like we have done above.

If in a specific case we take all weights to be the same we can conclude that this average weight must be 1/n to add up to 1 in total. This gives us.

Combining both methods

To combine both methods we will start by picking back up the Bayesian update

We can see that we can add many experiments by multiplying by the likelihood of each experiment. This gives us.

The final change that we need is that we can use all previous claims as experiments^[2]. This way we can see both the original odds and the likelihood ratios all as multiplied odds.

When we combine this with the opinion pool we will start to see the formula that we used. When the correlation is 0 every claim is evaluated separately and we are doing a Bayesian update and when correlation is 1 we are opinion pooling the subclaims.

This odds accumulator allows for adding together many different sources and subclaims. These calculated odds can then be the input odds of a new claim.

Grouping node with real world data

The core of my system I would like to call the grouping node. This grouping node is a slightly more complicated version of the subclaims above because it is also able to account for the strength of different sources on this claim. This grouping node will be shown in a bit in the form of a widget. First I will go over every part you can find in it.

The node below aims to answer the question: "What is the likelihood that the associated claim is true?". In this case the claim is: "A credible lab pathway exists for Covid". This is the value you see in the green field, by default 80%. It comes to this value by combining every piece of evidence connected to the claim.

At the top you could put in a prior, or knowledge before specific sources. This prior can represent previous knowledge you believe is not represented within any sources, If you're making claims about a coin toss this prior can represent that almost all coins are fair 50/50 coins. This prior can have a strength and a specific percentage. By default it is put to 0 to say that all knowledge this node has comes form its sources.

Below the prior you can see the sources. In this case S1-S7 each of these sources show the odds of the claim being true based on this specific source. These sources get multiplied like in the example above. These sources show one representational quote from the source and are a link to the source. This means that everybody who uses this tool can analyse every part of every claim and see what the result would be if one or more sources were interpreted differently.

The way to interpret the odds ratio next to each source is like an answer to the question "How often would we see this source in a world where the claim is true compared to a world where the claim is false". To give an example lets look into the claim "My coin has heads on both sides" and then we have the primary source "The coin landed heads after a toss". If the coin was fair we would expect 50% of the time heads, but if it had heads on both sides we would expect it 100%. We take the ratio between these two probabilities. This gives us 2/1 odds. So in this case It would be a supporting source with 2.0/1 odds.

To use the S5 WHO-China example. We are effectively saying "WHO-China is 1.6 times less likely to release this statement in a world with a credible lab pathway compared to a world without it". It can sometimes be impossible to know this likelihood ratio with absolute certainty. That is why this tool also gives a 90% certainty range that gets properly propagated into the output estimate.

In order for a tool like this to be at its most relevant we do need to calibrate the langage models. Here we can dig into the structured expert judgement literature Cooke, Hanea and Burgman have all spent decades calibrating different judgements. With calibration this kind of tool can go way further.

You can also change every relevant value simply by clicking and sliding the value. This is one additional way to make this knowledge tree accessible and approachable to everybody who uses it. I do not intend to have it feel like the computer just tells you the way something is. Instead I aim to show where different parts of the argument come from and how each part impacts the final claim.

Below you can see the grouping node visualized as a ledger. Every value is editable and I encourage you to try:

Attempting to graph the structure of arguments has been done before Squiggle, Kialo, and Argdown are a few examples. These services, however, have always had a hard time taking off, for what I believe is a simple reason: mapping out arguments is boring and hard work. People who want to map out entire arguments are few and far between, and those who do can already gather quite an audience from putting in this work.

Here is where I believe we have the new opportunity. Language models have now become capable enough to fill out these full graphs with only light handholding. And if the graphs are made to be inherently transparent any mistake will also quickly be transparent.

The fractal upside

The upside of this grouping node structure is that every node could have not only primary sources as input but also other grouping nodes allowing for building claims out of other sources. This allows us to argue for subclaims, as you can see the example claim is a subclaim of the Covid origins argument. This also allows us to chain together all claims into a big graph allowing for even more complicated representations and more accurate conclusions. If you're curious as to one implementation of is idea you can check it out on my website.

What is still needed for the magic encyclopedia

The method presented in this post is far from enough to map out every claim. This is only a starting point to apply some relevant mathematics to this subject. In order to show what I believe is still needed to use this as a building block I will use the 3 layers suggested by the Epistemic Case Study Competition. In this structure the three layers are ingestion, structure and assessment. This tool lives in Layer 2, where we try to structure every relevant part of a claim. This structure should be objective and be shared between everyone.

Layer 3: Assessment

Assessment is the most abstract layer. From this layer we need consistent testing to see if the tool is really useful and if people would really need it. The current implementation of this tool is transparent to help with this.

Layer 2: Structure

This current grouping node is still limited in many ways. This tool only combines odds of different claims. While this allows some level of clustering this cannot represent all claims. Some simple arguments such as "If you're outside and it's raining, you will be wet" cannot be contained in the grouping node. That is why I intend to add Boolean logic nodes and arithmetic nodes. Both of these together will allow every claim or combination of claims to be represented in this system. I plan on having my claim analysis system have these seven nodes.

Noisy AND node:
The noisy AND node allow for a group of blockers to be taken into account. In this formula is the probability that a subclaim holds is the blockers strength if the subclaim fails and is the base rate chance of success.

Noisy OR node:
The noisy OR node allows for a group of unlocks to be taken into account. In this formula is the probability that a subclaim holds is the unlocks strength if the subclaim holds and is the base rate chance of success even if all claims fail.

Possibility node:
The possibility node allows the hypothesis space to be split up into different hypotheses one of which has to happen. In this formula is the unnormalized probability of a claim ) is the normalization factor and is the normalized probability guaranteeing that all hypotheses add up to 100%.

Distribution node:
The distribution node allows for uncertain values to be represented and reasoned about. Each distribution node has a domain such as all positive rational numbers. In this formula is the uncertain value that is represented and describes the probability distribution of what X can be.

Estimate node:
The estimate node allows for a fermi estimate to be made using different distributions. A difficult to estimate distribution can be turned into many easy to estimate distributions. In this formula represent different distributions represents a formula using these distributions and represents the output distribution.

Predicate node:
A predicate node allows for probability claims to be extracted from distribution nodes. It does this by calculating the probability that a claim lies above a given threshold value. In this formula is the given threshold value is an uncertain value from a Distribution node and is the output probability of this node.

Grouping node:
The grouping node as shown in this article can be used to combine multiple sources into a single probability. This is needed because in the real world most claim will not have single conclusive sources and as such sources need to be grouped together. In this formula is the number of input ratio's is the correlation over all input nodes and sources is the every odds input and is the posterior.

With these 7 nodes in place all non causal claims can be fully represented and reasoned about. In further articles I will get into more detail regarding the other 6 nodes.

Layer 1: Ingestion

Ingestion is the combined process of finding primary sources and checking their validity. My structural tool needs this ingestion to connect the primary sources with claims. The most important component this structure still needs from layer 1 is a process that can answer any version of "What is the likelihood ratio of this primary source saying what it says depending on whether the claim is true or false". It also needs to find a reliable answer to the question "How strongly correlated are these two sources"^[3].

^{^}
To break this odds ratio down into something like a percentage requires us to go all the way to 9995/5 or 99.95%
^{^}
This is also done in Pearl Probabilistic Reasoning in Intelligent Systems on chapter 2.2.2 page 45
^{^}
This is my first-ever Lesswrong post. I would like to thank Tom, Glenn, Mark, and Elisabetta for helping me by proofreading and sharing their thoughts on the article.