Wiki Contributions


I agree. Let me elaborate, hopefully clarifying the post to Viliam (and others).

Regarding the basics of rationality, there's this cluster of concepts that includes "think in distributions, not binary categories", "Distributions Are Wide, wider than you think", selection effects, unrepresentative data, filter bubbles and so on. This cluster is clearly present in the essay. (There are other such clusters present as well - perhaps something about incentive structures? - but I can't name them as well.)

Hence, my reaction reading this essay was "Wow, what a sick combo!"

You have these dozens of basic concepts, then you combine them in the right way, and bam, you get Social Dark Matter.

Sure, yes, really the thing here is many smaller things in disguise - but those smaller basic things are not the point. The point is the combination!

It’s hard to describe (especially in lay terms) the experience of reading through (and finally absorbing) the sections of this paper one by one; the best analogy I can come up with would be watching an expert video game player nimbly navigate his or her way through increasingly difficult levels of some video game, with the end of each level (or section) culminating in a fight with a huge “boss” that was eventually dispatched using an array of special weapons that the player happened to have at hand.

This passage is from Terence Tao, describing his experiences reading a paper by Jean Bourgain, but it fits my experience reading this essay as well.

Once upon a time I stumbled upon LessWrong. I read a lot of the basic material. At the time I found them to be worldview-changing. I also read a classic post with the quote

“I re-read the Sequences”, they tell me, “and everything in them seems so obvious. But I have this intense memory of considering them revelatory at the time.” 

and thought "Huh, they are revelatory. Let's see if that happens to me".

(And guess what?)

There are these moments where I notice that something has changed. I remember reading some comment like "Rationalists have this typical mind fallacy, where they think that people in Debates and Public Conversations base and update their beliefs on evidence". That kind of moments remind me that oh right, everyone is not on board with The Truth being Very Important, they just don't really care that much, they care about some other things.

And I swear I haven't always had a reflex of focusing on the truth values of statements people say. I have also noticed that most of the time the lens of truth-values-of-things-people-say is just a wrong frame, a wrong way of looking at things.

Which is to say: Quaker isn't the default. By default truth is not the point.

(Which in turn makes me appreciate more those places and times where truth is the point.)

PS: Your recent posts have been good, the kind of posts why I got into LessWrong in the first place.

1. Investigate (randomly) modulary varying goals in modern deep learning architectures.


I did a small experiment regarding this. Short description below.

I basically followed the instructions given in the section: I trained a neural network on pairs of digits from the MNIST dataset. These two digits were glued together side-by-side to form a single image. I just threw something up for the network architecture, but the second-to-last layer had 2 nodes (as in the post).

I had two different type of loss functions / training regimes:

  • mean-square-error, the correct answer being x + y, where x and y are the digits in the images
  • mean-square-error, the correct answer being ax + by, where a and b are uniformly random integers from [-8, 8] (except excluding the case where a = 0 or b = 0), the values of a and b changing every 10 epochs.

In both cases the total number of epochs was 100. In the second case, for the last 10 epochs I had a = b = 1.

The hard part is measuring the modularity of the resulting models. I didn't come up with anything I was satisfied with, but here's the motivation for what I did (followed by what I did):

Informally, the "intended" or "most modular" solution here would be: the neural network consists of two completely separate parts, identifying the digits in the first and second half of the image, and only at the very end these classifications are combined. (C.f. the image in example 1 of the post.)

What would we expect to see if this were true? At least the following: if you change the digit in one half of the image to something else and then do a forward-pass, there are lots of activations in the network that don't change. Weaker alternative formulation: the activations in the network don't change very much.

So! What I did was: store the activations of the network when one half of the image is sampled randomly from the MNIST dataset (and other one stays fixed), and look at the Euclidean distances of those activation vectors. Normalizing by the (geometric) mean of the lengths of the activation vectors gives a reasonable metric of "how much did the activations change relative to their magnitude?". I.e. the metric I used is .

And the results? Were the networks trained with varying goals more modular on this metric?

(The rest is behind the spoiler, so that you can guess first.)

For the basic "predict x+y", the metric was on average 0.68+-0.02 or so, quite stable over the four random seeds I tested. For the "predict ax + by, a and b vary" I once or twice ran to an issue of the model just completely failing to predict anything. When it worked out at all, the metric was 0.55+-0.05, again over ~4 runs. So maybe a 20% decrease or so.

Is that a little or a lot? I don't know. It sure does not seem zero - modularly varying goals does something. Experiments with better notions of modularity would be great - I was bottlenecked by "how do you measure Actual Modularity, though?", and again, I'm unsatisfied with the method here.

Here is a related paper on "how good are language models at predictions", also testing the abilities of GPT-4: Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament.

Portion of the abstract:

To empirically test this ability, we enrolled OpenAI's state-of-the-art large language model, GPT-4, in a three-month forecasting tournament hosted on the Metaculus platform. The tournament, running from July to October 2023, attracted 843 participants and covered diverse topics including Big Tech, U.S. politics, viral outbreaks, and the Ukraine conflict. Focusing on binary forecasts, we show that GPT-4's probabilistic forecasts are significantly less accurate than the median human-crowd forecasts. We find that GPT-4's forecasts did not significantly differ from the no-information forecasting strategy of assigning a 50% probability to every question.

From the paper:

These data indicate that in 18 out of 23 questions, the median human-crowd forecasts were directionally closer to the truth than GPT-4’s predictions,


We observe an average Brier score for GPT-4’s predictions of B = .20 (SD = .18), while the human forecaster average Brier score was B = .07 (SD = .08).

The part about the Kelly criterion that has most attracted me is this:

That thing is that betting Kelly means that with probability 1, over time you'll be richer than someone who isn't betting Kelly. So if you want to achieve that, Kelly is great.

So with more notation, P(money(Kelly) > money(other)) tends to 1 as time goes to infinity (where money(policy) is the random score given by a policy).

This sounds kinda like strategic dominance - and you shouldn't use a dominated strategy, right? So you should Kelly bet!

The error in this reasoning is the "sounds kinda like" part. "Policy A dominates policy B" is not the same claim as P(money(A) >= money(B)) = 1. These are equivalent in "nice" finite, discrete games (I think), but not in infinite settings! Modulo issues with defining infinite games, the Kelly policy does not strategically dominate all other policies. So one shouldn't be too attracted to this property of the Kelly bet. 

(Realizing this made me think "oh yeah, one shouldn't privilege the Kelly bet as a normatively correct way of doing bets".)

One of my main objections to Bayesianism is that it prescribes that ideal agent's beliefs must be probability distributions, which sounds even more absurd to me.


From one viewpoint, I think this objection is satisfactorily answered by Cox's theorem - do you find it unsatisfactory (and if so, why)?

Let me focus on another angle though, namely the "absurdity" and gut level feelings of probabilities.

So, my gut feels quite good about probabilities. Like, I am uncertain about various things (read: basically everything), but this uncertainty comes in degrees: I can compare and possibly even quantify my uncertainties. I feel like some people get stuck on the numeric probabilities part (one example I recently ran to was this quote from Section III of this essay by Scott, "Does anyone actually consistently use numerical probabilities in everyday situations of uncertainty?"). Not sure if this is relevant here, but at the risk of going to a tangent, here's a way of thinking about probabilities I've found clarifying and which I haven't seen elsewhere:

The correspondence

beliefs <-> probabilities

is of the same type as

temperature <-> Celsius-degrees.

Like, people have feelings of warmth and temperature. These come in degrees: sometimes it's hotter than some other times, now it is a lot warmer than yesterday and so on. And sure, people don't have a built-in thermometer mapping these feelings to Celsius-degrees, they don't naturally think of temperature in numeric degrees, they frequently make errors in translating between intuitive feelings and quantitative formulations (though less so with more experience). Heck, the Celsius scale is only a few hundred years old! Still, Celsius degrees feel like the correct way of thinking about temperature.

And the same with beliefs and uncertainty. These come in degrees: sometimes you are more confident than some other times, now you are way more confident than yesterday and so on. And sure, people don't have a built-in probabilitymeter mapping these feelings to percentages, they don't naturally think of confidence in numeric degrees, they frequently make errors in translating between intuitive feelings and quantitative formulations (though less so with more experience). Heck, the probability scale is only a few hundred years old! Still, probabilities feel like the correct way of thinking about uncertainty.

From this perspective probabilities feel completely natural to me - or at least as natural as Celsius-degrees feel. Especially questions like "does anyone actually consistently use numerical probabilities in everyday situations of uncertainty?" seem to miss the point, in the same way that "does anyone actually consistently use numerical degrees in everyday situations of temperature?" seems to miss the point of the Celsius scale. And I have no gut level objections to the claim that an ideal agent's conceptions of warmth beliefs correspond to probabilities.

Open for any of the roles A, B, C. I should have a flexible schedule at my waking hours (around GMT+0). Willing to play for even long times, say a month (though in that case I'd be thinking about "hmm, could we get more quantity in addition to quality"). ELO probably around 1800.

Devices and time to fall asleep: a small self-experiment

I did a small self-experiment on the question "Does the use of devices (phone, laptop) in the evening affect the time taken to fall asleep?".


On each day during the experiment I went to sleep at 23:00. 

At 21:30 I randomized what I'll do at 21:30-22:45. Each of the following three options was equally likely:

  • Read a physical book
  • Read a book on my phone
  • Read a book on my laptop

At 22:45-23:00 I brushed my teeth etc. and did not use devices at this time.

Time taken to fall asleep was measured by a smart watch. (I have not selected it for being good to measure sleep, though.) I had blue light filters on my phone and laptop.


I ran the experiment for n = 17 days (the days were not consecutive, but all took place in a consecutive ~month).

I ended up having 6 days for "phys. book", 6 days for "book on phone" and 5 days for "book on laptop".

On one experiment day (when I read a physical book), my watch reported me as falling asleep at 21:31. I discarded this as a measuring error.

For the resulting 16 days, average times to fall asleep were 5.4 minutes, 21 minutes and 22 minutes, for phys. book, phone and laptop, respectively.

[Raw data:

Phys. book: 0, 0, 2, 5, 22

Phone: 2, 14, 21, 24, 32, 33

Laptop: 0, 6, 10, 27, 66.]


The sample size was small (I unfortunately lost the motivation to continue). Nevertheless it gave me quite strong evidence that being on devices indeed does affect sleep.

Iteration as an intuition pump

I feel like many game/decision theoretic claims are most easily grasped when looking at the iterated setup:

Example 1. When one first sees the prisoner's dilemma, the argument that "you should defect because of whatever the other person does, you are better off by defecting" feels compelling. The counterargument goes "the other person can predict what you'll do, and this can affect what they'll play".

This has some force, but I have had a hard time really feeling the leap from "you are a person who does X in the dilemma" to "the other person models you as doing X in the dilemma". (One thing that makes this difficult that usually in PD it is not specified whether the players can communicate beforehand or what information they have of each other.) And indeed, humans models' of other humans are limited - this is not something you should just dismiss.

However, the point "the Nash equilibrium is not necessarily what you should play" does hold, as is illustrated by the iterated Prisoner's dilemma. It feels intuitively obvious that in a 100-round dilemma there ought to be something better than always defecting.

This is among the strongest intuitions I have for "Nash equilibria do not generally describe optimal solutions".


Example 2. When presented with lotteries, i.e. opportunities such as "X% chance you win A dollars, (100-X)% chance of winning B dollars", it's not immediately obvious that one should maximize expected value (or, at least, humans generally exhibit loss aversion, bias towards certain outcomes, sensitivity to framing etc.).

This feels much clearer when given the option to choose between lotteries repeatedly. For example, if you are presented with the two buttons, one giving you a sure 100% chance of winning 1 dollar and the other one giving you a 40% chance of winning 3 dollars, and you are allowed to press the buttons a total of 100 times, it feels much clearer that you should always pick the one with the highest expected value. Indeed, as you are given more button presses, the probability of you getting (a lot) more money that way tends to 1 (by the law of large numbers).

This gives me a strong intuition that expected values are the way to go.

Example 3. I find Newcomb's problem a bit confusing to think about (and I don't seem to be alone in this). This is, however, more or less the same problem as prisoner's dilemma, so I'll be brief here.

The basic argument "the contents of the boxes have already been decided, so you should two-box" feel compelling, but then you realize that in an iterated Newcomb's problem you will, by backward induction, always two-box.

This, in turn, sounds intuitively wrong, in which case the original argument proves too much. 

One thing I like about iteration is that it makes the concept of ""it really is possible to make predictions about your actions" feel more plausible: there's clear-cut information about what kind of plays you'll make, namely the previous rounds. I feel like in my thoughts I sometimes feel like rejecting the premise, or thinking that "sure, if the premise holds, I should one-box, but it doesn't really work that way in real life, this feels like one of those absurd thought experiments that don't actually teach you anything". Iteration solves this issue.

Another pump I like is "how many iterations do there need to be before you Cooperate/maximize-expected-value/one-box?". There (I think) is some number of iterations for this to happen, and, given that, it feels like "1" is often the best answer.

All that said, I don't think iterations provide the Real Argument for/against the position presented. There's always some wiggle room for "but what if you are not in an iterated scenario, what if this truly is a Unique Once-In-A-Lifetime Opportunity?". I think the Real Arguments are something else - e.g. in example 2 I think coherence theorems give a stronger case (even if I still don't feel them as strongly on an intuitive level). I don't think I know the Real Argument for example 1/3.

Load More