244 Why I’m not a Bayesian

by Richard_Ngo

6th Oct 2024

Linkpost from www.mindthefuture.info

12 min read

110

244

Review

This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then lay out my two main objections (inspired by ideas in philosophy of science). A follow-up post will speculate about how to formalize an alternative.

Degrees of belief

The core idea of Bayesian epistemology: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. (Note that this is different from Bayesianism as a set of statistical techniques, or Bayesianism as an approach to machine learning, which I don’t discuss here.)

If that seems like a sufficient characterization to you, you can go ahead and skip to the next section, where I explain my objections to it. But for those who want a more precise description of Bayesianism, and some existing objections to it, I’ll more specifically characterize it in terms of five subclaims. Bayesianism says that we should ideally reason in terms of:

Propositions which are either true or false (classical logic)
Each of which is assigned a credence (probabilism)
Representing subjective degrees of belief in their truth (subjectivism)
Which at each point in time obey the axioms of probability (static rationality)
And are updated over time by applying Bayes’ rule to new evidence (rigid empiricism)

I won’t go into the case for Bayesianism here except to say that it does elegantly formalize many common-sense intuitions. Bayes’ rule follows directly from a straightforward Venn diagram. The axioms of probability are powerful and mathematically satisfying. Subjective credences seem like the obvious way to represent our uncertainty about the world. Nevertheless, there are a wide range of alternatives to Bayesianism, each branching off from the claims listed above at different points:

Traditional epistemology only accepts #1, and rejects #2. Traditional epistemologists often defend a binary conception of knowledge—e.g. one defined in terms of justified true belief (or a similar criterion, like reliable belief).
Frequentism accepts #1 and #2, but rejects #3: it doesn’t think that credences should be subjective. Instead, frequentism holds that credences should correspond to the relative frequency of an event in the long term, which is an objective fact about the world. For example, you should assign 50% credence that a flipped coin will come up heads, because if you continued flipping the coin the proportion of heads would approach 50%.
Garrabrant induction accepts #1 to #3, but rejects #4. In order for credences to obey the axioms of probability, all the logical implications of a statement must be assigned the same credence. But this “logical omniscience” is impossible for computationally-bounded agents like ourselves. So in the Garrabrant induction framework, credences instead converge to obeying the axioms of probability in the limit, without guarantees that they’re coherent after only limited thinking time.
Radical probabilism accepts #1 to #4, but rejects #5. Again, this can be motivated by qualms about logical omniscience: if thinking for longer can identify new implications of our existing beliefs, then our credences sometimes need to update via a different mechanism than Bayes’ rule. So radical probabilism instead allows an agent to update to any set of statically rational credences at any time, even if they’re totally different from its previous credences. The one constraint is that each credence needs to converge over time to a fixed value—i.e. it can’t continue oscillating indefinitely (otherwise the agent would be vulnerable to a Dutch Book).

It’s not crucial whether we classify Garrabrant induction and radical probabilism as variants of Bayesianism or alternatives to it, because my main objection to Bayesianism doesn’t fall into any of the above categories. Instead, I think we need to go back to basics and reject #1. Specifically, I have two objections to the idea that idealized reasoning should be understood in terms of propositions that are true or false:

We should assign truth-values that are intermediate between true and false (fuzzy truth-values)
We should reason in terms of models rather than propositions (the semantic view)

I’ll defend each claim in turn.

Degrees of truth

Formal languages (like code) are only able to express ideas that can be pinned down precisely. Natural languages, by contrast, can refer to vague concepts which don’t have clear, fixed boundaries. For example, the truth-values of propositions which contain gradable adjectives like “large” or “quiet” or “happy” depend on how we interpret those adjectives. Intuitively speaking, a description of something as “large” can be more or less true depending on how large it actually is. The most common way to formulate this spectrum is as “fuzzy” truth-values which range from 0 to 1. A value close to 1 would be assigned to claims that are clearly true, and a value close to 0 would be assigned to claims that are clearly false, with claims that are “kinda true” in the middle.

Another type of “kinda true” statements are approximations. For example, if I claim that there’s a grocery store 500 meters away from my house, that’s probably true in an approximate sense, but false in a precise sense. But once we start distinguishing the different senses that a concept can have, it becomes clear that basically any concept can have widely divergent category boundaries depending on the context. A striking example from Chapman:

A: Is there any water in the refrigerator?
B: Yes.
A: Where? I don’t see it.
B: In the cells of the eggplant.

The claim that there’s water in the refrigerator is technically true, but pragmatically false. And the concept of “water” is far better-defined than almost all abstract concepts (like the ones I’m using in this post). So we should treat natural-language propositions as context-dependent by default. But that’s still consistent with some statements being more context-dependent than others (e.g. the claim that there’s air in my refrigerator would be true under almost any interpretation). So another way we can think about fuzzy truth-values is as a range from “this statement is false in almost any sense” through “this statement is true in some senses and false in some senses” to “this statement is true in almost any sense”.

Note, however, that there’s an asymmetry between “this statement is true in almost any sense” and “this statement is false in almost any sense”, because the latter can apply to two different types of claims. Firstly, claims that are meaningful but false (“there’s a tiger in my house”). Secondly, claims that are nonsense—there are just no meaningful interpretations of them at all (“colorless green ideas sleep furiously”). We can often distinguish these two types of claims by negating them: “there isn’t a tiger in my house” is true, whereas “colorless green ideas don’t sleep furiously” is still nonsense. Of course, nonsense is also a matter of degree—e.g. metaphors are by default less meaningful than concrete claims, but still not entirely nonsense.

So I've motivated fuzzy truth-values from four different angles: vagueness, approximation, context-dependence, and sense vs nonsense. The key idea behind each of them is that concepts have fluid and amorphous category boundaries (a property called nebulosity). However, putting all of these different aspects of nebulosity on the same zero-to-one scale might be an oversimplification. More generally, fuzzy logic has few of the appealing properties of classical logic, and (to my knowledge) isn’t very directly useful. So I’m not claiming that we should adopt fuzzy logic wholesale, or that we know what it means for a given proposition to be X% true instead of Y% true (a question which I’ll come back to in a follow-up post). For now, I’m just claiming that there’s an important sense in which thinking in terms of fuzzy truth-values is less wrong (another non-binary truth-value) than only thinking in terms of binary truth-values.

Model-based reasoning

The intuitions in favor of fuzzy truth-values become clearer when we apply them, not just to individual propositions, but to models of the world. By a model I mean a (mathematical) structure that attempts to describe some aspect of reality. For example, a model of the weather might have variables representing temperature, pressure, and humidity at different locations, and a procedure for updating them over time. A model of a chemical reaction might have variables representing the starting concentrations of different reactants, and a method for determining the equilibrium concentrations. Or, more simply, a model of the Earth might just be a sphere.

In order to pin down the difference between reasoning about propositions and reasoning about models, philosophers of science have drawn on concepts from mathematical logic. They distinguish between the syntactic content of a theory (the axioms of the theory) and its semantic content (the models for which those axioms hold). As an example, consider the three axioms of projective planes:

For any two points, exactly one line lies on both.
For any two lines, exactly one point lies on both.
There exists a set of four points such that no line has more than two of them.

There are infinitely many models for which these axioms hold; here’s one of the simplest:

Geometric figure including triangle ACE with interior circle BDF and center point G. Point B is on line segment AC, D is on CE, and F is on AE. G is the center of the circle. Point G is on line segments AD, BE, and CF.

If propositions and models are two sides of the same coin, does it matter which one we primarily reason in terms of? I think so, for two reasons. Firstly, most models are very difficult to put into propositional form. We each have implicit mental models of our friends’ personalities, of how liquids flow, of what a given object feels like, etc, which are far richer than we can express propositionally. The same is true even for many formal models—specifically those whose internal structure doesn’t directly correspond to the structure of the world. For example, a neural network might encode a great deal of real-world knowledge, but even full access to the weights doesn’t allow us to extract that knowledge directly—the fact that a given weight is 0.3 doesn’t allow us to claim that any real-world entity has the value 0.3.

What about scientific models where each element of the model is intended to correspond to an aspect of reality? For example, what’s the difference between modeling the Earth as a sphere, and just believing the proposition “the Earth is a sphere”? My answer: thinking in terms of propositions (known in philosophy of science as the syntactic view) biases us towards assigning truth values in a reductionist way. This works when you’re using binary truth-values, because they relate to each other according to classical logic. But when you’re using fuzzy truth-values, the relationships between the truth-values of different propositions become much more complicated. And so thinking in terms of models (known as the semantic view) is better because models can be assigned truth-values in a holistic way.

As an example: “the Earth is a sphere” is mostly true, and “every point on the surface of a sphere is equally far away from its center” is precisely true. But “every point on the surface of the Earth is equally far away from the Earth’s center” seems ridiculous—e.g. it implies that mountains don’t exist. The problem here is that rephrasing a proposition in logically equivalent terms can dramatically affect its implicit context, and therefore the degree of truth we assign to it in isolation.

The semantic view solves this by separating claims about the structure of the model itself from claims about how the model relates to the world. The former are typically much less nebulous—claims like “in the spherical model of the Earth, every point on the Earth’s surface is equally far away from the center” are straightforwardly true. But we can then bring in nebulosity when talking about the model as a whole—e.g. “my spherical model of the Earth is closer to the truth than your flat model of the Earth”, or “my spherical model of the Earth is useful for doing astronomical calculations and terrible for figuring out where to go skiing”. (Note that we can make similar claims about the mental models, neural networks, etc, discussed above.)

We might then wonder: should we be talking about the truth of entire models at all? Or can we just talk about their usefulness in different contexts, without the concept of truth? This is the major debate in philosophy of science. I personally think that in order to explain why scientific theories can often predict a wide range of different phenomena, we need to make claims about how well they describe the structure of reality—i.e. how true they are. But we should still use degrees of truth when doing so, because even our most powerful scientific models aren’t fully true. We know that general relativity isn’t fully true, for example, because it conflicts with quantum mechanics. Even so, it would be absurd to call general relativity false, because it clearly describes a major part of the structure of physical reality. Meanwhile Newtonian mechanics is further away from the truth than general relativity, but still much closer to the truth than Aristotelian mechanics, which in turn is much closer to the truth than animism. The general point I’m trying to illustrate here was expressed pithily by Asimov: “Thinking that the Earth is flat is wrong. Thinking that the Earth is a sphere is wrong. But if you think that they’re equally wrong, you’re wronger than both of them put together.”

The correct role of Bayesianism

The position I’ve described above overlaps significantly with the structural realist position in philosophy of science. However, structural realism is usually viewed as a stance on how to interpret scientific theories, rather than how to reason more generally. So the philosophical position which best captures the ideas I’ve laid out is probably Karl Popper’s critical rationalism. Popper was actually the first to try to formally define a scientific theory's degree of truth (though he was working before the semantic view became widespread, and therefore formalized theories in terms of propositions rather than in terms of models). But his attempt failed on a technical level; and no attempt since then has gained widespread acceptance. Meanwhile, the field of machine learning evaluates models by their loss, which can be formally defined—but the loss of a model is heavily dependent on the data distribution on which it’s evaluated. Perhaps the most promising approach to assigning fuzzy truth-values comes from Garrabrant induction, where the “money” earned by individual traders could be interpreted as a metric of fuzzy truth. However, these traders can strategically interact with each other, making them more like agents than typical models.

Where does this leave us? We’ve traded the crisp, mathematically elegant Bayesian formalism for fuzzy truth-values that, while intuitively compelling, we can’t define even in principle. But I’d rather be vaguely right than precisely wrong. Because it focuses on propositions which are each (almost entirely) true or false, Bayesianism is actively misleading in domains where reasoning well requires constructing and evaluating sophisticated models (i.e. most of them).

For example, Bayesians measure evidence in “bits”, where one bit of evidence rules out half of the space of possibilities. When asking a question like “is this stranger named Mark?”, bits of evidence are a useful abstraction: I can get one bit of evidence simply by learning whether they’re male or female, and a couple more by learning that their name has only one syllable. Conversely, talking in Bayesian terms about discovering scientific theories is nonsense. If every PhD in fundamental physics had contributed even one bit of usable evidence about how to unify quantum physics and general relativity, we’d have solved quantum gravity many times over by now. But we haven’t, because almost all of the work of science is in constructing sophisticated models, which Bayesianism says almost nothing about. (Formalisms like Solomonoff induction attempt to sidestep this omission by enumerating and simulating all computable models, but that’s so different from what any realistic agent can do that we should think of it less as idealized cognition and more as a different thing altogether, which just happens to converge to the same outcome in the infinite limit.)

Mistakes like these have many downstream consequences. Nobody should be very confident about complex domains that nobody has sophisticated models of (like superintelligence); but the idea that “strong evidence is common” helps justify confident claims about them. And without a principled distinction between credences that are derived from deep, rigorous models of the world, and credences that come from vague speculation (and are therefore subject to huge Knightian uncertainty), it’s hard for public discussions to actually make progress.

Should I therefore be a critical rationalist? I do think Popper got a lot of things right. But I also get the sense that he (along with Deutsch, his most prominent advocate) throws the baby out with the bathwater. There is a great deal of insight encoded in Bayesianism which critical rationalists discard (e.g. by rejecting induction). A better approach is to view Bayesianism as describing a special case of epistemology, which applies in contexts simple enough that we’ve already constructed all relevant models or hypotheses, exactly one of which is exactly true (with all the rest of them being equally false), and we just need to decide between them. Interpreted in that limited way, Bayesianism is both useful (e.g. in providing a framework for bets and prediction markets) and inspiring: if we can formalize this special case so well, couldn’t we also formalize the general case? What would it look like to concretely define degrees of truth? I don’t have a solution, but I’ll outline some existing attempts, and play around with some ideas of my own, in a follow-up post.

Review

BayesianismRationalityWorld Modeling

Curated

244

Coalitional agency

6 comments61 karma

Towards a scale-free theory of intelligent agency

51 comments102 karma

Mentioned in

67Should you go with your best guess?: Against precise Bayesianism and related views

48Evaluating the truth of statements in a world of ambiguous language.

45Deeper Reviews for the top 15 (of the 2024 Review)

28Existing UDTs test the limits of Bayesianism (and consistency)

16Coupling for Decouplers

Why I’m not a Bayesian

New Comment

110 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:06 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]johnswentworth1y15657

You're pointing to good problems, but fuzzy truth values seem to approximately-totally fail to make any useful progress on them; fuzzy truth values are a step in the wrong direction.

Walking through various problems/examples from the post:

"For example, the truth-values of propositions which contain gradable adjectives like 'large' or 'quiet' or 'happy' depend on how we interpret those adjectives." You said it yourself: the truth-values depend on how we interpret those adjectives. The adjectives are ambiguous, they have more than one common interpretation (and the interpretation depends on context). Saying that "a description of something as 'large' can be more or less true depending on how large it actually is" throws away the whole interesting phenomenon here: it treats the statement as having a single fixed truth-value (which happens to be quantitative rather than 0/1), when the main phenomenon of interest is that humans use multiple context-dependent interpretations (rather than one interpretation with one truth value).
"For example, if I claim that there’s a grocery store 500 meters away from my house, that’s probably true in an approximate sense, but false in a pr

... (read more)

[-]abramdemski1y430

I would like to defend fuzzy logic at greater length, but I might not find the time. So, here is my sketch.

Like Richard, I am not defending fuzzy logic as exactly correct, but I am defending it as a step in the right direction.

The Need for Truth

As Richard noted, meaning is context-dependent. When I say "is there water in the fridge?" I am not merely referring to h2o; I am referring to something like a container of relatively pure water in easily drinkable form.

However, I claim: if we think of statements as being meaningful, we think these context-dependent meanings can in principle be rewritten into a language which lacks the context-independence.

In the language of information theory, the context-dependent language is what we send across the communication channel. The context-independent language is the internal sigma algebra used by the agents attempting to communicate.

You seem to have a similar picture:

It is totally allowed for semantics of a proposition to be very dependent on context within that model - more precisely, there would be a context-free interpretation of the proposition in terms of latent variables, but the way those latents relate to the world would involve a lot o

... (read more)

8johnswentworth1y

I generally agree that self-reference issues require "fuzzy truth values" in some sense, but for Richard's purposes I expect that sort of thing to end up looking basically Bayesian (much like he lists logical induction as essentially Bayesian).

5abramdemski1y

Yeah, I agree with that.

7ProgramCrafter1y

Well, a straightforward continuation of paradox would be "This sentence has truth value in [0;1)"; is it excluded by "plausible assumptions" or overlooked?

3abramdemski1y

Excluded. Truth-functions are required to be continuous, so a predicate that's true of things in the interval [0,1) must also be true at 1. (Lukaziewicz does not assume continuity, but rather, proves it from other assumptions. In fact, Lukaziewicz is much more restrictive; however, we can safely add any continuous functions we like.) One justification of this is that it's simply the price you have to pay for consistency; you (provably) can't have all the nice properties you might expect. Requiring continuity allows consistent fixed-points to exist. Of course, this might not be very satisfying, particularly as an argument in favor of Lukaziewicz over other alternatives. How can we justify the exclusion of [0,1) when we seem to be able to refer to it? As I mentioned earlier, we can think of truth as a vague term, with the fuzzy values representing an ordering of truthiness. Therefore, there should be no way to refer to "absolute truth". We have to think of assigning precise numbers to the vague values as merely a way to model this phenomenon. (It's up to you to decide whether this is just a bit of linguistic slight-of-hand or whether it constitutes a viable position...) When we try to refer to "absolute truth" we can create a function which outputs 1 on input 1, but which declines sharply as we move away from 1.[1] This is how the model reflects the fact that we can't refer to absolute truth. We can map 1 to 1 (make a truth-function which is absolutely true only of absolute truth), however, such a function must also be almost-absolutely-true in some small neighborhood around 1. This reflects the idea that we can't completely distinguish absolute truth from its close neighborhood. Similarly, when we negate this function, it "represents" [0,1) in the sense that it is only 0 (only 'absolutely false') for the value 1, and maps [0,1) to positive truth-values which can be mostly 1, but which must decline in the neighborhood of 1. And yes, this setup can get us into

2Dweomite1y

I'm confused about how continuity poses a problem for "This sentence has truth value in [0,1)" without also posing an equal problem for "this sentence is false", which was used as the original motivating example. I'd intuitively expect "this sentence is false" == "this sentence has truth value 0" == "this sentence does not have a truth value in (0,1]"

2abramdemski1y

"X is false" has to be modeled as something that is value 1 if and only if X is value 0, but continuously decreases in value as X continuously increases in value. The simplest formula is value(X is false) = 1-value(X). However, we can made "sharper" formulas which diminish in value more rapidly as X increases in value. Hartry Field constructs a hierarchy of such predicates which he calls "definitely false", "definitely definitely false", etc. Proof systems for the logic should have the property that sentences are derivable only when they have value 1; so "X is false" or "X is definitely false" etc all share the property that they're only derivable when X has value zero.

1ProgramCrafter1y

Understood. Does that formulation include most useful sentences? For instance, "there exists a sentence which is more true than this one" must be excluded as equivalent to "this statement's truth value is strictly less than 1", but the extent of such exclusion is not clear to me at first skim.

1ProgramCrafter1y

Then why not consider structure as follows? 1. you are searching for "something like a container of relatively pure water in easily drinkable form" - or, rather, "[your subconscious-native code] of water-like thing + for drinking", 2. you emit sequence of tokens (sounds/characters) "is there water in the fridge?", approximating previous idea (discarding your intent to drink it as it might be inferred from context, omitting that you can drink something close to water), 3. conversation partner hears "is there water in the fridge?", converted into thought "you asked 'is there water in the fridge?'", 4. and interprets words as "you need something like a container of relatively pure water in easily drinkable form" - or, rather, "[their subconscious-native code] for another person, a water-like thing + for drinking". That messes up with "meanings of sentences" but is necessary to rationally process filtered evidence.

7abramdemski1y

It seems to me that there is a really interesting interplay of different forces here, which we don't yet know how to model well. Even if Alice tries meticulously to only say literally true things, and be precise about her meanings, Bob can and should infer more than what Alice has literally said, by working backwards to infer why she has said it rather than something else. So, pragmatics is inevitable, and we'd be fools not to take advantage of it. However, we also really like transparent contexts -- that is, we like to be able to substitute phrases for equivalent phrases (equational reasoning, like algebra), and make inferences based on substitution-based reasoning (if all bachelors are single, and Jerry is a bachelor, then Jerry is single). To put it simply, things are easier when words have context-independent meanings (or more realistically, meanings which are valid across a wide array of contexts, although nothing will be totally context-independent). This puts contradictory pressure on language. Pragmatics puts pressure towards highly context-dependent meaning; reasoning puts pressure towards highly context-independent meaning. If someone argues a point by conflation (uses a word in two different senses, but makes an inference as if the word had one sense) then we tend to fault using the same word in two different senses, rather than fault basic reasoning patterns like transitivity of implication (A implies B, and B implies C, so A implies C). Why is that? Is that the correct choice? If meanings are inevitably context-dependent anyway, why not give up on reasoning? ;p

0[comment deleted]1y

[-]Richard_Ngo1y2510

Ty for the comment. I mostly disagree with it. Here's my attempt to restate the thrust of your argument:

The issues with binary truth-values raised in the post are all basically getting at the idea that the meaning of a proposition is context-dependent. But we can model context-dependence in a Bayesian way by referring to latent variables in the speaker's model of the world. Therefore we don't need fuzzy truth-values.

But this assumes that, given the speaker's probabilistic model, truth-values are binary. I don't see why this needs to be the case. Here's an example: suppose my non-transhumanist friend says "humanity will be extinct in 100 years". And I say "by 'extinct' do you include genetically engineered until future humans are a different species? How about being uploaded? How about all being cryonically frozen, to be revived later? How about...."

In this case, there is simply no fact of the matter about which of these possibilities should be included or excluded in the context of my friend's original claim, because (I'll assume) they hadn't considered any of those possibilities.

More prosaically, even if I have considered some possibilities in the past, at the time when I make a s... (read more)

[-]johnswentworth1y101

But this assumes that, given the speaker's probabilistic model, truth-values are binary.

In some sense yes, but there is totally allowed to be irreducible uncertainty in the latents - i.e. given both the model and complete knowledge of everything in the physical world, there can still be uncertainty in the latents. And those latents can still be meaningful and predictively powerful. I think that sort of uncertainty does the sort of thing you're trying to achieve by introducing fuzzy truth values, without having to leave a Bayesian framework.

Let's look at this example:

suppose my non-transhumanist friend says "humanity will be extinct in 100 years". And I say "by 'extinct' do you include genetically engineered until future humans are a different species? How about being uploaded? How about all being cryonically frozen, to be revived later? How about...."
In this case, there is simply no fact of the matter about which of these possibilities should be included or excluded in the context of my friend's original claim...

Here's how that would be handled by a Bayesian mind:

There's some latent variable representing the semantics of "humanity will be extinct in 100 years"; call that variable S

... (read more)

6Richard_Ngo1y

What would resolve the uncertainty that remains after you have conditioned on the entire low-level state of the physical world? (I assume that we're in the logically omniscient setting here?)

7johnswentworth1y

We are indeed in the logically omniscient setting still, so nothing would resolve that uncertainty. The simplest concrete example I know is the Boltzman distribution for an ideal gas - not the assorted things people say about the Boltzmann distribution, but the actual math, interpreted as Bayesian probability. The model has one latent variable, the temperature T, and says that all the particle velocities are normally distributed with mean zero and variance proportional to T. Then, just following the ordinary Bayesian math: in order to estimate T from all the particle velocities, I start with some prior P[T], calculate P[T|velocities] using Bayes' rule, and then for ~any reasonable prior I end up with a posterior distribution over T which is very tightly peaked around the average particle energy... but has nonzero spread. There's small but nonzero uncertainty in T given all of the particle velocities. And in this simple toy gas model, those particles are the whole world, there's nothing else to learn about which would further reduce my uncertainty in T.

5cubefox1y

Fuzzy truth values can't be avoided by disambiguation and fixing a context. They are the result of vague predicates: adjectives, verbs, nouns etc. Most concepts don't have crisp boundaries, and some objects will fit a term more or less than others.

[-]johnswentworth1y171

That's still not a problem of fuzzy truth values, it's a problem of a fuzzy category boundaries. These are not the same thing.

The standard way to handle fuzzy category boundaries in a Bayesian framework is to treat semantic categories as clusters, and use standard Bayesian cluster models.

9tailcalled1y

The Eggplant later discusses some harder problems with fuzzy categories: (I think this is hard than it looks because in addition to severing off the category at some of these edge-cases, one also has to avoid severing off the category at other edge-cases. The Eggplant mostly focuses on reductionistic categories rather than statistical categories and so doesn't bother proving that the Bayesian clustering can't go through.) You might think these are also solved with Bayesian cluster models, but I don't think they are, unless you put in a lot of work beyond basic Bayesian cluster models to bias it towards giving the results you want. (Like, you could pick the way people talk about the objects as the features you use for clustering, and in that case I could believe you would get nice/"correct" clusters, but this seems circular in the sense that you're not deriving the category yourself but just copying it off humans.) Roughly speaking, you are better off thinking of there as being an intrinsic ranking of the features of a thing by magnitude or importance, such that the cluster a thing belongs to is its most important feature.

6ChristianKl1y

Before writing The Eggplant, Chapman did write more specifically about why Bayesianism doesn't work in https://metarationality.com/probability-and-logic David Chapman's position of "I created a working AI that makes deductions using mathematics that are independent of probability and can't be represented with probability" seem like it does show that Bayesianism as a superset for agent foundations doesn't really work as agents can reason in ways that are not probability based.

2johnswentworth1y

Hadn't seen that essay before, it's an interesting read. It looks like he either has no idea that Bayesian model comparison is a thing, or has no idea how it works, but has a very deep understanding of all the other parts except model comparison and has noticed a glaring model-comparison-shaped hole.

4ChristianKl1y

How does Bayesian model comparison allow you to do predicate calculus?

1johnswentworth1y

First, the part about using models/logics with probabilities. (This part isn't about model comparison per se, but is necessary foundation.) (Terminological note: the thing a logician would call a "logic" or possibly a "logic augmented with some probabilities" I would instead normally call a "model" in the context of Bayesian probability, and the thing a logician would call a "model" I would instead normally call a "world" in the context of Bayesian probability; I think that's roughly how standard usage works.) Roughly speaking: you have at least one plain old (predicate) logic, and all "random" variables are scoped to their logic, just like ordinary logic. To bring probability into the picture, the logic needs to be augmented with enough probabilities of values of variables in the logic that the rest of the probabilities can be derived. All queries involving probabilities of values of variables then need to be conditioned on a logic containing those variables, in order to be well defined. Typical example: a Bayes net is a logic with a finite set of variables, one per node in the net, augmented with some conditional probabilities for each node such that we can derive all probabilities. Most of the interesting questions of world modeling are then about "model comparison" (though a logician would probably rather call it "logic comparison"): we want to have multiple hypotheses about which logics-augmented-with-probabilities best predict some real-world system, and test those hypotheses statistically just like we test everything else. That's why we need model comparison.

4ChristianKl1y

The main point of the article is that once you add probabilities you can't do predicate calculus anymore. It's a mathematical operation that's not defined for the entities that you get when you do your augmentation.

3johnswentworth1y

Is the complaint that you can't do predicate calculus on the probabilities? Because I can certainly use predicate calculus all I want on the expressions within the probabilities. And if that is the complaint, then my question is: why do we want to do predicate calculus on the probabilities? Like, what would be one concrete application in which we'd want to do that? (Self-reference and things in that cluster would be the obvious use-case, I'm mostly curious if there's any other use-case.)

4ChristianKl1y

Imagine, you have a function f that takes a_1, a_2, ..., a_n and returns b_1, b_2, ... b_m. a_1, a_2, ..., a_n are boolean states of the known world and b_1, b_2, ... b_m boolean states of the world you don't yet know. Because f uses predicate logic internally you can't modify it to take values between 0 and 1 and have to accept that it can only take boolean values. When you do your probability augmentation you can easily add probabilities to a_1, a_2, ..., a_n and have P(a_1), P(a_2), ..., P(a_n), as those are part of the known world. On the other hand, how would you get P(b_1), P(b_2), ... , P(b_m)?

3johnswentworth1y

I'm not quite understanding the example yet. Two things which sound similar, but are probably not what you mean because they're straightforward Bayesian models: * I'm given a function f: A -> B and a distribution (a↦P[A=a]) over the set A. Then I push forward the distribution on A through f to get a distribution over B. * Same as previous, but the function f is also unknown, so to do things Bayesian-ly I need to have a prior over f (more precisely, a joint prior over f and A). How is the thing you're saying different from those? Or: it sounds like you're talking about an inference problem, so what's the inference problem? What information is given, and what are we trying to predict?

4ChristianKl1y

I'm talking about a function that takes a one-dimensional vector of booleans A and returns a one-dimensional vector B. The function does not accept a one-dimensional vector of real numbers between 0 and 1. To be able to "push forward" probabilities, f would need to be defined to handle probabilities.

6johnswentworth1y

The standard push forward here would be: P[B=b]=∑aI[f(a)=b]P[A=a] where I[...] is an indicator function. In terms of interpretation: this is the frequency at which I will see B take on value b, if I sample A from the distribution P[A] and then compute B via B = f(A). What do you want to do which is not that, and why do you want to do it?

6ChristianKl1y

Most of the time, the data you gather about the world is that you have a bunch of facts about the world and probabilities about the individual data points and you would want as an outcome also probabilities over individual datapoints. As far as my own background goes, I have not studied logic or the math behind the AI algorithm that David Chapman wrote. I did study bioinformatics in that that study we did talk about probabilities calculations that are done in bioinformatics, so I have some intuitions from that domain, so I take a bioinformatics example even if I don't know exactly how to productively apply predicate calculus to the example. If you for example get input data from gene sequencing and billions of probabilities (a_1, a_2, ..., a_n) and want output data about whether or not individual genetic mutations exist (b_1, b_2, ..., b_m) and not just P(B) = P(b_1) * P(b_2) * ... * P(b_m). If you have m = 100,000 in the case of possible genetic mutations, P(B) is a very small number with little robustness to error. A single bad b_x will propagate to make your total P(B) unreliable. You might have an application where getting a b_234, b_9538 and b _33889 wrong is an acceptable error because most of the values where good.

2tailcalled1y

I feel like this treat predicate logic as being "logic with variables", but "logic with variables" seems more like Aristotelian logic than like predicate logic to me.

2johnswentworth1y

Another way to view it: a logic, possibly a predicate logic, is just a compact way of specifying a set of models (in the logician's sense of the word "models", i.e. the things a Bayesian would normally call "worlds"). Roughly speaking, to augment that logic into a probabilistic model, we need to also supply enough information to derive the probability of each (set of logician!models/Bayesian!worlds which assigns the same truth-values to all sentences expressible in the logic). Does that help?

2tailcalled1y

Idk, I guess the more fundamental issue is this treats the goal as simply being assigning probabilities to statements in predicate logic, whereas his point is more about whether one can do compositional reasoning about relationships while dealing with nebulosity, and it's this latter thing that's the issue.

3johnswentworth1y

What's a concrete example in which we want to "do compositional reasoning about relationships while dealing with nebulosity", in a way not handled by assigning probabilities to statements in predicate logic? What's the use-case here? (I can see a use-case for self-reference; I'm mainly interested in any cases other than that.)

2tailcalled1y

You seem to be assuming that predicate logic is unnecessary, is that true?

2johnswentworth1y

No, I explicitly started with "you have at least one plain old (predicate) logic". Quantification is fine.

2tailcalled1y

Ah, sorry, I think I misparsed your comment.

3Garrett Baker1y

How do you get the features, and how do you decide on importance? I expect for certain answers of these questions John will agree with you.

2tailcalled1y

Those are difficult questions that I don't know the full answer to yet.

1localdeity1y

I am dismayed by the general direction of this conversation. The subject is vague and ambiguous words causing problems, there's a back-and-forth between several high-karma users, and I'm the first person to bring up "taboo the vague words and explain more precisely what you mean"?

[-]abramdemski1y146

That's an important move to make, but it is also important to notice how radically context-dependent and vague our language is, to the point where you can't really eliminate the context-dependence and vagueness via taboo (because the new words you use will still be somewhat context-dependent and vague). Working against these problems is pragmatically useful, but recognizing their prevalence can be a part of that. Richard is arguing against foundational pictures which assume these problems away, and in favor of foundational pictures which recognize them.

3localdeity1y

You don't need to "eliminate" the vagueness, just reduce it enough that it isn't affecting any important decisions. (And context-dependence isn't necessarily a problem if you establish the context with your interlocutor.) I think this is generally achievable, and have cited the Eggplant essay on this. And if it is generally achievable, then: I think you should handle the problems separately. In which case, when reasoning about truth, you should indeed assume away communication difficulties. If our communication technology was so bad that 30% of our words got dropped from every message, the solution would not be to change our concept of meanings; the solution would be to get better at error correction, ideally at a lower level, but if necessary by repeating ourselves and asking for clarification a lot. Elsewhere there's discussion of concepts themselves being ambiguous. That is a deeper issue. But I think it's fundamentally resolved in the same way: always be alert for the possibility that the concept you're using is the wrong one, is incoherent or inapplicable to the current situation; and when it is, take corrective action, and then proceed with reasoning about truth. Be like a digital circuit, where at each stage your confidence in the applicability of a concept is either >90% or <10%, and if you encounter anything in between, then you pause and figure out a better concept, or find another path in which this ambiguity is irrelevant.

[-]abramdemski1y103

Richard is arguing against foundational pictures which assume these problems away, and in favor of foundational pictures which recognize them.
I think you should handle the problems separately. In which case, when reasoning about truth, you should indeed assume away communication difficulties. If our communication technology was so bad that 30% of our words got dropped from every message, the solution would not be to change our concept of meanings; the solution would be to get better at error correction, ideally at a lower level, but if necessary by repeating ourselves and asking for clarification a lot.

You seem to be assuming that these issues arise only due to communication difficulties, but I'm not completely on board with that assumption. My argument is that these issues are fundamental to map-territory semantics (or, indeed, any concept of truth).

One argument for this is to note that the communicators don't necessarily have the information needed to resolve the ambiguity, even in principle, because we don't think in completely unambiguous concepts. We employ vague concepts like baldness, table, chair, etc. So it is not as if we have completely unambiguous pictures i... (read more)

2tailcalled1y

The Eggplant discusses why that doesn't work.

[-]localdeity1y*129

It's a decent exploration of stuff, and ultimately says that it does work:

Language is not the problem, but it is the solution. How much trouble does the imprecision of language cause, in practice? Rarely enough to notice—so how come? We have many true beliefs about eggplant-sized phenomena, and we successfully express them in language—how?
These are aspects of reasonableness that we’ll explore in Part Two. The function of language is not to express absolute truths. Usually, it is to get practical work done in a particular context. Statements are interpreted in specific situations, relative to specific purposes. Rather than trying to specify the exact boundaries of all the variants of a category for all time, we deal with particular cases as they come up.

If the statement you're dealing with has no problematic ambiguities, then proceed. If it does have problematic ambiguities, then demand further specification (and highlighting and tabooing the ambiguous words is the classic way to do this) until you have what you need, and then proceed.

I'm not claiming that it's practical to pick terms that you can guarantee in advance will be unambiguous for all possible readers and all possib... (read more)

3tailcalled1y

It probably works for Richard's purpose (personal epistemology) but not for John's or my purpose (agency foundations research).

7cubefox1y

A proposition expressed by "a is F" has a fuzzy truth value whenever F is a vague predicate. Since vague concepts figure in most propositions, their truth values are affected as well. When you talk about "standard Bayesian cluster models", you talk about (Bayesian) statistics. But Richard talks about Bayesian epistemology. This doesn't involve models, only beliefs, and beliefs are propositions combined with a degree to which they are believed. See the list with the five assumptions of Bayesian epistemology in the beginning.

3Benaya Koren1y

I don't think that this solution gives you everything that you want from semantic categories. Assume for example that you have a multidimensional cluster with heavy tails (for simplicity, assume symmetry under rotation). You measure some of the features, and determine that the given example belongs to the cluster almost surely. You want to use this fact to predict the other features. knowing the deviation of the known features is still relevant for your uncertainty about the other features. You may think about this extra property as measuring "typicality", or as measuring "how much does it really belong in the cluster.

6localdeity1y

Solution: Taboo the vague predicates and demand that the user explain more precisely what they mean.

3xpym1y

It still misses the key issue of ontological remodeling. If the world-model is inadequate for expressing a proposition, no meaningful probability could be assigned to it.

2NunoSempere1y

Maybe you could address these problems, but could you do so in a way that is "computationally cheap"? E.g., for forecasting on something like extinction, it is much easier to forecast on a vague outcome than to precisely define it.

[-]Raymond Douglas1y281

When I read this post I feel like I'm seeing four different strands bundled together:
1. Truth-of-beliefs as fuzzy or not
2. Models versus propositions
3. Bayesianism as not providing an account of how you generate new hypotheses/models
4. How people can (fail to) communicate with each other

I think you hit the nail on the head with (2) and am mostly sold on (4), but am sceptical of (1) - similar to what several others have said, it seems to me like these problems don't appear when your beliefs are about expected observations, and only appear when you start to invoke categories that you can't ground as clusters in a hierarchical model.

That leaves me with mixed feelings about (3):
- It definitely seems true and significant that you can get into a mess by communicating specific predictions relative to your own categories/definitions/contexts without making those sufficiently precise
- I am inclined to agree that this is a particularly important feature of why talking about AI/x-risk is hard
- It's not obvious to me that what you've said above actually justifies knightian uncertainty (as opposed to infrabayesianism or something), or the claim that you can't be confident about superintelligence (although it might be true for other reasons)

[-]Kaarel1y198

I find it surprising/confusing/confused/jarring that you speak of models-in-the-sense-of-mathematical-logic=:L-models as the same thing as (or as a precise version of) models-as-conceptions-of-situations=:C-models. To explain why these look to me like two pretty much entirely distinct meanings of the word 'model', let me start by giving some first brushes of a picture of C-models. When one employs a C-model, one likens a situation/object/etc of interest to a situation/object/etc that is already understood (perhaps a mathematical/abstract one), that one expects to be better able to work/play with. For example, when one has data about sun angles at a location throughout the day and one is tasked with figuring out the distance from that location to the north pole, one translates the question to a question about 3d space with a stationary point sun and a rotating sphere and an unknown point on the sphere and so on. (I'm not claiming a thinker is aware of making such a translation when they make it.) Employing a C-model $\approx$ making an analogy. From inside a thinker, the objects/situations on each side of the analogy look like... well, things/situations; from outside a thinker, bo... (read more)

[-]Mark Xu1y*1511

tentative claim: there are models of the world, which make predictions, and there is "how true they are", which is the amount of noise you fudge the model with to get lowest loss (maybe KL?) in expectation.

E.g. "the grocery store is 500m away" corresponds to "my dist over the grocery store is centered at 500m, but has some amount of noise"

[-]Mark Xu1y1210

related to the claim that "all models are meta-models", in that they are objects capable of e.g evaluating how applicable they are for making a given prediction. E.g. "newtonian mechanics" also carries along with it information about how if things are moving too fast, you need to add more noise to its predictions, i.e. it's less true/applicable/etc.

2Richard_Ngo1mo

This depends on the data distribution though, which could vary greatly (and in fact the data you collect will vary based on your actions which in turn are based on your models). So I think a lot of the action is in defining which loss we care about.

2cubefox1y

So perhaps noise ≈ inverse of variance ≈ degree of truth?

[-]Haiku1y144

I am not well-read on this topic (or at-all read, really), but it struck me as bizarre that a post about epistemology would begin by discussing natural language. This seems to me like trying to grasp the most fundamental laws of physics by first observing the immune systems of birds and the turbulence around their wings.

The relationship between natural language and epistemology is more anthropological* that it is information-theoretical. It is possible to construct models that accurately represent features of the cosmos without making use of any language at all, and as you encounter in the "fuzzy logic" concept, human dependence on natural language is often an impediment to gaining accurate information.

Of course, natural language grants us many efficiencies that make it extremely useful in ancestral human contexts (as well as most modern ones). And given that we are humans, to perform error correction on our models, we have to model our own minds and the process of examination and modelling itself as part of the overall system we are examining and modelling. But the goal of that recursive modelling is to reduce the noise and error caused by the fuzziness of natural language and oth... (read more)

[-]sarahconstantin1y*1410

I think I agree with this post directionally.

You cannot apply Bayes' Theorem until you have a probability space; many real-world situations, especially the ones people argue about, do not have well-defined probability spaces, including a complete set of mutually exclusive and exhaustive possible events, which are agreed upon by all participants in the argument.

You will notice that, even on LessWrong, people almost never have Bayesian discussions where they literally apply Bayes' Rule. It would probably be healthy to try to literally do that more often! But making a serious attempt to debate a contentious issue "Bayesianly" typically looks more like Rootclaim's lab leak debate, which took a lot of setup labor and time, and where the result of quantifying the likelihoods was to reveal just how heavily your "posterior" conclusion depends on your "prior" assumptions, which were outside the scope of debate.

I think prediction markets are good, and I think Rootclaim-style quantified debates are worth doing occasionally, but what we do in most discussion isn't Bayesian and can't easily be made Bayesian.

I am not so sure about preferring models to propositions. I think what you'r... (read more)

[-]Richard_Ngo1mo134Review for 2024 Review

The context for this post is that I've had qualms about bayesian epistemology for most of the last decade. My most notable attempts to express them previously were Realism about rationality and Against strong bayesianism. In hindsight, those posts weren't great, but they're interesting as documentation of waypoints on my intellectual journey (see also here and here). This post is another such waypoint. Since writing it last year, I've built on these ideas (and my qualms about expected utility maximization) to continue developing my theory of coalitional agency. I don't know how compelled most readers feel by what I've written publicly about this research agenda thus far (i.e. this sequence, most of the posts on this blog, and some recent shortforms) but I'm very excited about it and expect to make significant progress on it in 2026.

I'm also still fairly happy with this post specifically, and expect that it will stand the test of time better than the other two above (in part because it's starting to articulate a positive vision rather than just bashing bayesianism). My main regret is on a pedagogical level: it was a mistake to start with point 1 (fuzzy truth values) rather than poin... (read more)

[-]abramdemski1y133

One thing I don't understand / don't agree with here is the move from propositions to models. It seems to me that models can be (and usually are) understood in terms of propositions.

For example, Solomonoff understands models as computer programs which generate predictions. However, computer programs are constructed out of bits, which can be understood as propositions. The bits are not very meaningful in isolation; the claim "program-bit number 37 is a 1" has almost no meaning in the absence of further information about the other program bits. However, this isn't much of an issue for the formalism.

Similarly, I expect that any attempt to formally model "models" can be broken down into propositions. EG, if someone claimed that humans understand the world in terms of systems of differential equations, this would still be well-facilitated by a concept of propositions (ie, the equations).

It seems to me like a convincing abandonment of propositions would have to be quite radical, abandoning the idea of formalism entirely. This is because you'd have to explain why your way of thinking about models is not amenable to a mathematical treatment (since math is commonly understood in terms of propositions).

So (a) I'm not convinced that thinking in terms of propositions makes it difficult to think in terms of models; (b) it seems to me that refusing to think in terms of propositions would make it difficult to think in terms of models.

1Richard_Ngo1y

In my post I defend the use of propositions as a way to understand models, and attack the use of propositions as a way to understand reality. You can think of this as a two-level structure: claims about models can be crisp and precise enough that it makes sense to talk about them in propositional terms, but for complex bits of reality you mostly want to make claims of the form "this is well-modeled by model X". Those types of claims need to be understood in terms of continuous truth-values: they're basically never entirely true or entirely false. Separately, Solomonoff programs are non-central examples of models because they do not come with structural correspondences to reality attached (except via their inputs and outputs). Most models have some mapping that allows you to point at program-bits and infer some features of reality from them. I notice as I write this that there's some tension in my position: I'm saying we shouldn't apply propositions to reality, but also the mappings I mentioned above allow us to formulate propositions like "the value of X in reality is approximately the value of this variable in my model". So maybe actually I'm actually arguing for a middle ground between two extremes: 1. The basic units of epistemology should all map precisely to claims about reality, and should be arbitrarily combinable and composable (the propositional view) 2. The basic units of epistemology should only map to claims about reality in terms of observable predictions, and not be combinable or composable at all (the Solomonoff view) This spectrum isn't fully well-defined even in my head but seems like an interesting way to view things which I'll think more about.

2abramdemski1y

I agree that Solomonoff’s epistemology is noncentral in the way you describe, but I don't think it impacts my points very much; replace Solomonoff with whatever epistemic theory you like. It was just a convenient example. (Although I expect defenders of Solomonoff to expect the program bits to be meaningful; and I somewhat agree. It's just that the theory doesn't address the meaning there, instead treating programs more like black-box predictors.) In my view, meaning is the property of being optimized to adhere to some map-territory relationship. However, this optimization itself must always occur within some model (it provides the map-territory relationship to optimize for). In the context of Solomonoff Induction, this may emerge from the incentive to predict, but it is not easy to reason about. In some sense, reality isn't made of bits, propositions, or any such thing; it is of unknowable type. However, we always describe it via terms of some type (a language). I'm no longer sure where the disagreement lies, if any, but I still feel like the original post overstates things.

[-]Nathan Young1mo100Review for 2024 Review

Solid article.

Defines terms in ways I agree with. Raised objections I hadn't thought of. Thought provoking.

On the object level the criticisms of bayesianism seem solid, but I am unsure if the replacement is good.

[-]Richard Korzekwa2mo100Review for 2024 Review

I initially wanted to nominate this because I somewhat regularly say things like "I think the problem with that line of thinking is that you're not handling your model uncertainty in the right way, and I'm not good at explaining it, but Richard Ngo has a post that I think explains it well." Instead of leaving it at that, I'll try to give an outline of why I found it so helpful. I didn't put much thought into how to organize this review, it's centered very much around my particular difficulties, and I'm still confused about some of this, but hopefully it gets across some of what I got out of it.

This post helped me make sense of a cluster of frustrations I've had around my thinking and others' thinking, especially in domains where things are complex and uncertain. The allure of cutting the world up into clear, distinct, and exhaustive possibilities is strong, but doing so doesn't always lead to clearer thinking. To give a few examples where I've seen this lead people astray (choosing not particularly charitable or typical examples, for simplicity):

The origins of covid-19 are zoonotic or a lab leak
AI research will or will not be automated by 2027
AI progress after time t

... (read more)

[-]localdeity1y104

Statements do often have ambiguities: there are a few different more-precise statements they could be interpreted to mean, and sometimes those more-precise statements have different truth values. But the solution is not to say that the ambiguous statement has an ambiguous truth value and therefore discard the idea of truth. The solution is to do your reasoning about the more-precise statements, and, if someone ever hands you ambiguous statements whose truth value is important, to say "Hey, please explain more precisely what you meant." Why would one do otherwise?

By the way:

colorless green ideas sleep furiously

There is a straightforward truth value here: there are no colorless green ideas, and therefore it is vacuously true that all of them sleep furiously.

4Richard_Ngo1y

"Dragons are attacking Paris!" seems true by your reasoning, since there are no dragons, and therefore it is vacuously true that all of them are attacking Paris.

9localdeity1y

Are you not familiar with the term "vacuously true"? I find this very surprising. People who study math tend to make jokes with it. The idea is that, if we were to render a statement like "Colorless green ideas sleep furiously" into formal logic, we'd probably take it to mean the universal statement "For all X such that X is a colorless green idea, X sleeps furiously". A universal statement is logically equivalent to "There don't exist any counterexamples", i.e. "There does not exist X such that X is a colorless green idea and X does not sleep furiously". Which is clearly true, and therefore the universal is equally true. There is, of course, some ambiguity when rendering English into formal logic. It's not rare for English speakers to say "if" when they mean "if and only if", or "or" when they mean "exclusive or". (And sometimes "Tell me which one", as in "Did you do A, or B?" "Yes." "Goddammit.") Often this doesn't cause problems, but sometimes it does. (In which case, as I've said, the solution is not to give their statement an ambiguous truth value, but rather to ask them to restate it less ambiguously.) "Dragons are attacking Paris" seems most naturally interpreted as the definite statement "There's some unspecified number—but since I used the plural, it's at least 2—of dragons that are attacking Paris", which would be false. One could also imagine interpreting it as a universal statement "All dragons are currently attacking Paris", which, as you say, would be vacuously true since there are no dragons. However, in English, the preferred way to say that would be "Dragons attack Paris", as CBiddulph says. "Dragons are attacking Paris" uses the present progressive tense, while "Dragons attack Paris" uses what is called the "simple present"/"present indefinite" tense. Wiki says: English grammar rules aren't necessarily universal and unchanging, but they do give at least medium-strength priors on how to interpret a sentence.

2Jim Pivarski1y

This sounds like it's using Russell's theory of descriptions, in that you're replacing "Colorless green ideas do Y" with "For all X such that X is a colorless green idea, X does Y." Not everyone agrees that this is a correct interpretation, in part because it seems that statements like "Dragons are attacking Paris" should be false. I think it would be reasonable to say that "colorless green ideas" is not just a set of objects in which there are no existing members, but meaningless (for two reasons: "colorless" and "green" conflict, and ideas can't be colored, anyway). I think that was Chomsky's intention—not to write a false sentence, but a meaningless one.

2cubefox1y

I don't think so. "Smoking causes cancer" doesn't express a universal (or existential) quantification either. Or "Canadians are polite", "Men are taller than women" etc.

3localdeity1y

Grammatically, the most obvious interpretation is a universal quantification (i.e. "All men are taller than all women"), which I think is a major reason why such statements so often lead to objections of "But here's an exception!" Maybe you can tell the audience that they should figure out when to mentally insert "... on average" or "tend to be". Though there are also circumstances where one might validly believe that the speaker really means all. I think it's best to put such qualified language into your statements from the start.

2Benaya Koren1y

Here I mostly agree Here I don't, for the same reason that I don't ask about "water in the refrigerator outside eggplant cells". Because pragmatics are for better or worse part of the language.

2Caleb Biddulph1y

Your example wouldn't be true, but "Dragons attack Paris" would be, interpreted as a statement about actual dragons' habits

[-]Vladimir_Nesov1y9-1

I think things (minds, physical objects, social phenomena) should be characterized by computations that they could simulate/incarnate. The most straightforward example is a computer that holds a program, it could start running it. The program is not in any way fundamentally there, it's an abstraction of what the computer physically happens to be. And it still characterizes the computer even if it's not inevitable that it will start running, merely the possibility that it could start running is significant to the interactive behavior of the computer, the wa... (read more)

2ProgramCrafter1y

I don't seem to understand how you use the word "thing" here; if it can refer to a physical object, then what computations can a wooden crate do, for instance? If none, then it doesn't get characterized different to a cup, and that seems strange..

5Vladimir_Nesov1y

Self-supervised learning is a widely applicable illustration, it extracts computations from a phenomenon as circuits of a model. So you might hide some details of a crate and ask which principles reconstruct them, some theory of parallelepipeds might be relevant, or material properties of wood. These computations take in a problem statement (context) and then arrive at further facts implied by it. This doesn't cleanly extract individual computations, and has trouble eliciting potential computations that don't manifest in actuality under most circumstances. Presence of more general minds helps with that, humans might be able to represent such facts of potentiality about other things and then write them down, so that the less general self-superwised learning can observe their traces on the web corpus. Another issue is that this gets to lump together all things from the world, the models learn what the world simulates, not what individual things simulate. This is significant when the things in question are people or civilizations, and understanding them on their own, without distortion from external circumstance, is key to defining respect for their autonomy, or aims and decisions that are their own. (I tried to articulate a related point in this post, though I seem to have failed, since there were multiple convergent objections that missed it. I explain more in my comment replies there.)

[-]Ruby1y*82

Curated. I really like that even though LessWrong is 1.5 decades old now and has Bayesianism assumed as background paradigm while people discuss everything else, nonetheless we can have good exploration of our fundamental epistemological beliefs.

The descriptions of unsolved problems, or at least incompleteness of Bayesianism strikes me as technically correct. Like others, I'm not convinced of Richard's favored approach, but it's interesting. In practice, I don't think these problems undermine the use of Bayesianism in typical LessWrong thought. For example... (read more)

[-]Cole Wyeth1y70

Verbal statements often have context dependent or poorly defined truth value, but observations are pretty (not completely) solid. Since useful models eventually shake out into observations, the binary truth values tagging observations "propagate back" through probability theory to make useful statements about models. I am not convinced that we need a fuzzier framework - though I am interested in the philosophical justification for probability theory in the "unrealizable" case where no element of the hypothesis class is true. For instance, it seems that universal distributions mixture is over probabilistic models none of which should necessarily be assumed true, but rather only the widest class we can compute.

[-]cubefox1y*60

Yes, propositions are abstractions which don't exactly correspond to anything in our mind. But they do seem to have advantages: When communicating, we use sentences, which can be taken to express propositions. And we do seem to intuitively have propositional attitudes like "beliefs" (believing a proposition to be true) and "desires" (wanting a proposition to be true) in our mind. Which are expressible in sentences again. So propositions seem to be a quite natural abstraction. Treating them as being either true or false is a further simplification which wor... (read more)

[-]tailcalled1y62

A: Is there any water in the refrigerator?
B: Yes.
A: Where? I don’t see it.
B: In the cells of the eggplant.

The issue here is ambiguity between root-cause analysis (nobody has channeled a container for water to be among the objects currently in the refrigerator) vs reductionism (eggplants diminish into (among lots of things) water).

The problem with Bayesianism here is not that it uses binary rather than fuzzy truth-values (fuzzy truth-values don't really solve this, as you admit, though they're also not really incrementally closer to solving it), but rather ... (read more)

[-]Archimedes1y51

Can you help me tease out the difference between language being fuzzy and truth itself being fuzzy?

It's completely impractical to eliminate ambiguity in language, but for most scientific purposes, it seems possible to operationalize important statements into something precise enough to apply Bayesian reasoning to. This is indeed the hard part though. Bayes' theorem is just arithmetic layered on top of carefully crafted hypotheses.

The claim that the Earth is spherical is neither true nor false in general but usually does fall into a binary if we specify wha... (read more)

6Haiku1y

I don't find any use for the concept of fuzzy truth, primarily because I don't believe that such a thing meaningfully exists. The fact that I can communicate poorly does not imply that the environment itself is not a very specific way. To better grasp the specific way that things actually are, I should communicate less poorly. Everything is the way that it is, without a moment of regard for what tools (including language) we may use to grasp at it. (In the case of quantum fluctuations, the very specific way that things are involves precise probabilistic states. The reality of superposition does not negate the above.)

3Richard_Ngo1y

Suppose you have two models of the earth; one is a sphere, one is an ellipsoid. Both are wrong, but they're wrong in different ways. Now, we can operationalize a bunch of different implications of these hypotheses, but most of the time in science the main point of operationalizing the implications is not to choose between two existing models, or because we care directly about the operationalizations, but rather to come up with a new model that combines their benefits.

7Archimedes1y

I see what you're gesturing at but I'm having difficulty translating it into a direct answer to my question. Cases where language is fuzzy are abundant. Do you have some examples of where a truth value itself is fuzzy (and sensical) or am I confused in trying to separate these concepts?

5cubefox1y

Yes, this separation is confused. "Bob is bald" is true if Bob is contained in the set of bald things, and false if he is not contained in the set of bald things. But baldness is a vague concept, its extension is a fuzzy set. The containment relation is a partial one. So Bob isn't just either in the set or not in the set. To use binary truth values here, we have to make the simplifying assumption that "bald" is not vague. Otherwise we get fuzzy truth values which indicate the degree to which Bob is contained in the fuzzy set of bald things.

2jmh1y

What happens when Bob can be found in or out of the set of bald things at different times or in different situations, but we might not understand (or even be well aware) of the conditions that drive Bob's membership in the set when we're evaluating baldness and Bob? Can membership in baldness turn out to be some type of quantum state thing? That might be a basis for separating the concept of fuzzy language and fuzzy truth.But I would agree that if we can identify all possible cases where Bob is or is not in the set of baldness one might claim truth is no longer fuzzy but one needs to then prove that knowledge of all possible states has been established I think.

[-]johnswentworth1mo40

I'd be curious to see a review from Richard on this post. We had some back-and-forth in the comments, but I don't know what updates came out.

4Richard_Ngo1mo

Good suggestion, have reviewed here.

[-]JesseClifton1y40

This paper discusses two semantics for Bayesian inference in the case where the hypotheses under consideration are known to be false.

Verisimilitude: p(h) = the probability that that h is closest to the truth [according to some measure of closeness-to-truth] among hypotheses under consideration
Counterfactual: p(h) = the probability of h given the (false) supposition that one of the hypotheses under consideration is true

In any case, it’s unclear what motivates making decisions by maximizing expected value against such probabilities, which seems like a ... (read more)

1Richard_Ngo1y

Ty for the link but these seem like both clearly bad semantics (e.g. under either of these the second-best hypothesis under consideration might score arbitrarily badly).

[-]romeostevensit1y42

without a principled distinction between credences that are derived from deep, rigorous models of the world, and credences that come from vague speculation

Double counting issues here as well, in communities.

[-]Charlie Steiner1y40

Is this a fair summary (from a sort of reverse direction)?

We start with questions like "Can GR and QM be unified?" where we sort of think we know what we mean based on a half-baked, human understanding both of the world and of logic. If we were logically omniscient we could expound a variety of models that would cash out this human-concept-space question more precisely, and within those models we could do precise reasoning - but it's ambiguous how our real world half-baked understanding actually corresponds to any given precise model.

[-]Cole Wyeth10mo30

This is a response directly to comments made by Richard Ngo at the CMU agent foundations conference. Though he requested I comment here, the claims I want to focus on go beyond this (and the previous post) and include the following:

1: redefining agency as coalitional (agent = cooperating subagents) as opposed to the normal belief/goal model.

2: justifying this model by arguing that subagents are required for robustness in hard domains (specifically those that require concept invention).

3: that therefore AIXI is irrelevant for understanding agency.&nbs... (read more)

4Richard_Ngo10mo

Thank you Cole for the comment! Some quick thoughts in response (though I've skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text): Yepp, this is a good rephrasing. I'd clarify a bit by saying: after some level of decomposition, the recursion reaches agents which are limited to simple enough domains (like recognizing shapes in your visual field) that they aren't strongly bottlenecked on forming new concepts (like all higher-level agents are). In domains that simple, the difference between heuristics and planners is much less well-defined (e.g. a "pick up a cup" subagent has a scope of maybe 1 second, so there's just not much planning to do). So I'm open to describing such subagents as utility-maximizers with bounded scope (e.g. utility 1 if they pick up the cup in the next second, 0 if they don't, -10 if they knock it over). This is still different from "utility-maximizers" in the classic LessWrong sense (which are usually understood as not being bounded in terms of time or scope). This feels crucial to me. There's a level of optimality at which you no longer care about robustness, because you're so good at planning that you can account for every consideration. Stockfish, for example, is willing to play moves that go against any standard chess intuition, because it has calculated out so many lines that it's confident it works in that specific case. (Though even then, note that this leaves it vulnerable to neural chess engines!) But for anything short of that, you want to be able to integrate subagents with non-overlapping ontologies into the same decision procedure. E.g. if your internally planning subagent has come up with some clever and convoluted plan, you want some other subagent to be able to say "I can't critique this plan in its own ontology but my heuristics say it's going to fail". More generally, attempted unifications of ontologies have the same problem as

2Noosphere8910mo

Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries. So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and convergent over a long timescale, and it's democracies/coalition politics that are fragile over the sweep of history, because they only became dominant-ish starting in the 18th century and end sometime in the 21st century.

2Cole Wyeth10mo

What is this coalitional structure for if not to approximate an EU maximizing agent?

2Richard_Ngo10mo

This quote from my comment above addresses this:

2Cole Wyeth10mo

So the thing that coalitional agents are robust at is acting approximately like belief/goal agents, and you’re only making a structural claim about agency? If so, I find your model pretty plausible.

4Richard_Ngo10mo

Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as "acting like a belief/goal agent" in the limit, but part of my point is that we don't even know what it means to act "approximately like belief/goal agents" in realistic regimes, because e.g. belief/goal agents as we currently characterize them can't learn new concepts. Relatedly, see the dialogue in this post.

7Cole Wyeth8mo

Update: I am increasingly convinced that Bayesianism is not a complete theory of intelligence and may not be the best fundamental basis for agent foundations research, but I am still not convinced that coalitional agency is the right direction.

4Richard_Ngo8mo

Interesting. Got a short summary of what's changing your mind? I now have a better understanding of coalitional agency, which I will be interested in your thoughts on when I write it up.

4Cole Wyeth8mo

Mostly talking to you, talking to Abram, and reading Tom Sterkenburg's thesis. Briefly: I am now less confident that realizability assumptions are ever satisfied for embedded agents in our universe (Vanessa Kosoy / Diffractor argue this fairly convincingly). In fact this is probably similar to a standard observation about the scientific method (I read Alchin's "theory of knowledge", Hutter recommends avoiding editions 3rd and after). As an example intuition, with runtime restrictions it seems to be impossible to construct universal mixtures (Vladimir Vovk impressed this on me). In the unrealizable case, I now appreciate Bayesian learning as one specific expert advice aggregator (albeit an abnormally principled one equipped with now-standard analysis). I appreciate the advantages of other approaches with partial experts, with Garrabrant induction as an extreme case. I still endorse the Bayesian approach in many cases, in particular when it is at least possible to formulate a reasonable hypothesis class that contains the truth.

1Jonas Hallgren10mo

I saw the comment and thought I would drop some stuff that are beginnings of approaches for a more mathematical theory of iterated agency. A general underlying idea is to decompose a system into it's maximally predictive sub-agents, sort of like an arg-max of daniel dennetts intentional stance. There are various underlying reasons for why you would believe that there are algorithms for discovering the most important nested sub-parts of systems using things like Active Inference especially where it has been applied in computational biology. Here's some related papers: https://arxiv.org/abs/1412.2447 - We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a system, inducing partitions (which can be nested) https://arxiv.org/pdf/2209.01619 - Trying to relate Agency to POMDPs and the intentional stance.

[-]B Jacobs1y30

You might find my post on this interesting.

[-]IrenicTruth1y34

I shy away from fuzzy logic because I used it as a formalism to justify my religious beliefs. (In particular, "Possibilistic Logic" allowed me to appear honest to myself—and I'm not sure how much of it was self-deception and how much was just being wrong.)

The critical moment in my deconversion came when I realized that if I was looking for truth, I should reason according to the probabilities of the statements I was evaluating. Thirty minutes later, I had gone from a convinced Christian speaking to others, leading in my local church, and basing my life and... (read more)

[-]Richard_Kennaway1y30

Colorless green ideas sleeping furiously.

[-]AlanCrowe1y20

You are over-simplifying Bayesian reasoning. Giving partial credence to propositions doesn't work; numerical values representing partial credence must be attached to the basic conjunctions.

For example, if the propositions are A, B, and C, the idea for coping with incomplete information that every-one has, is to come up with something like P(A)=0.2, P(B)=0.3, P(C)=0.4 This doesn't work.

One has to work with the conjunctions and come up with something like

P(A and B and C) = 0.1

P(A and B and not C) = 0.1

P(A and not B and C) = 0.1

P(A and not B and n... (read more)

[-]James Camacho1y2-2

Natural languages, by contrast, can refer to vague concepts which don’t have clear, fixed boundaries

I disagree. I think it's merely the space is so large that it's hard to pin down where the boundary is. However, language does define natural boundaries (that are slightly different for each person and language, and shift over time). E.g., see "Efficient compression in color naming and its evolution" by Zaslavsky et al.

[-]kareempforbes1y10

Your article is a great read!

In my view, we can categorize scientists into two broad types: technician scientists, who focus on refining and perfecting existing theories, and creative scientists, who make generational leaps forward with groundbreaking ideas. No theory is ever 100% correct—each is simply an attempt to better explain a phenomenon in a way that’s useful to us.

Take Newton, for example. His theory of gravity was revolutionary, introducing concepts no one had thought of before—it was a generational achievement. But then Einstein came along... (read more)

[-]Joseph Gardi1y10

My attempt at a TLDR for this: Bayesian assign a probability to each belief in order to represent uncertainty but this is insufficient because there are multiple kinds of uncertainty: vagueness, approximation, context-dependence, and sense vs nonsense, knightian. And sometimes we humans make logical errors when we try to do bayesian inference.

[-]Otto Von Wegen1y10

First of all Popper and Deutsch don't discard induction entirely. They just argue against induction as a source/foundation of knowledge.

Now one comment on the Bayesian endevour: As a laymen mathematician I have little authority in saying this, but isn't it obvious that a probability calculation fails if the absolute value is infinite?

I was there at the beginning of the lesswrong movement and this emphasis on probabilistic thinking is new to me. Bayesianism also is blacklisted under my personal philosophical firewall for being gameable by social engin... (read more)

[+][comment deleted]1mo20

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

244

Why I’m not a Bayesian

244

Degrees of belief

Degrees of truth

Model-based reasoning

The correct role of Bayesianism

244

The Need for Truth