(Written for Arbital in 2017.)
So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'.
And before we even ask what those are, we might first ask, Why?
There's a mathematical formalism, 'expected utility', that some people invented to talk about making decisions. This formalism is very academically popular, and appears in all the textbooks.
But so what? Why is that necessarily the best way of making decisions under every kind of circumstance? Why would an Artificial Intelligence care what's academically popular? Maybe there's some better way of thinking about rational agency? Heck, why is this formalism popular in the first place?
We can ask the same kinds of questions about probability theory:
Okay, we have this mathematical formalism in which...
The Curse of the Counterfactual is a side-effect of the way our brains process is-ought distinctions. It causes our brains to compare our past, present, and future to various counterfactual imaginings, and then blame and punish ourselves for the difference between reality, and whatever we just made up to replace it.
Seen from the outside, this process manifests itself as stress, anxiety, procrastination, perfectionism, creative blocks, loss of motivation, inability to let go of the past, constant starting and stopping on one goal or frequent switching between goals, low self-esteem and many other things. From the inside, however, these counterfactuals can feel more real to us than reality itself, which can make it difficult to even notice it's happening, let alone being able to stop it.
Unfortunately, even though each specific instance of the curse can be defused using relatively simple techniques, we can’t just remove the parts of
...Personal bias alert — I would guess that my own moral brain is perhaps in the 5th percentile of judginess and desire to punish transgressors
Note that this is not evidence in favor of being able to unlearn judginess, unless you're claiming you were previously at the opposite end of the spectrum, and then unlearned it somehow. If so, then I would love to know what you did, because it would be 100% awesome and I could do with being a lot less judgy myself, and would love a way to not have to pick off judgmental beliefs one at a time.
If you have something ... (read more)
[EDIT: A crucial consideration was pointed out in the comments. For all the designs I've looked at, it's cheaper to just get a heat exchanger and ventilation fans, and blow the air outside/pull it inside and eat the extra heating costs/throw on an extra layer of clothing, than it is to buy a CO2 stripper. There's still an application niche for poorly ventilated rooms without windows, but that describes a lot fewer occasions than my previous dreams of commercial use.]
So, I have finally completed building a CO2 stripper that removes CO2 from the air to (hopefully) improve cognition in environments with high CO2 levels. In California, the weather is pretty good so it's easy to just crack a window at any point during the year, but other areas get quite cold during the winter or quite warm during summer and it's infeasible to open a window unless you want to...
It is currently disassembled in my garage, will be fully tested when the 2.0 version is built, and the 2.0 version has had construction stalled for this year because I've been working on other projects. The 1.0 version did remove CO2 from a room as measured by a CO2 meter, but the size and volume made it not worthwhile.
Response To: Who Likes Simple Rules?
Epistemic Status: Working through examples with varying degrees of confidence, to help us be concrete and eventually generalize.
Robin Hanson has, in his words, “some puzzles” that I will be analyzing. I’ve added letters for reference.
Might be helpful to say more about what it would mean to clean up this particular post?
This post is eventually about partial agency. However, it's been a somewhat tricky point for me to convey; I take the long route. Epistemic status: slightly crazy.
I've occasionally said that everything boils down to credit assignment problems.
One big area which is "basically credit assignment" is mechanism design. Mechanism design is largely about splitting gains from trade in a way which rewards cooperative behavior and punishes uncooperative behavior. Many problems are partly about mechanism design:
Another big area which I claim as "basically credit assignment" (perhaps more controversially) is artificial intelligence.
In the 1970s, John Holland kicked off the investigation of learning classifier systems. John Holland had recently invented the Genetic Algorithms paradigm, which applies an evolutionary paradigm to...
This seems like one I would significantly re-write for the book if it made it that far. I feel like it got nominated for the introductory material, which I wrote quickly in order to get to the "main point" (the gradient gap). A better version would have discussed credit assignment algorithms more.
Internal Family Systems (IFS) is a psychotherapy school/technique/model which lends itself particularly well for being used alone or with a peer. For years, I had noticed that many of the kinds of people who put in a lot of work into developing their emotional and communication skills, some within the rationalist community and some outside it, kept mentioning IFS.
So I looked at the Wikipedia page about the IFS model, and bounced off, since it sounded like nonsense to me. Then someone brought it up again, and I thought that maybe I should reconsider. So I looked at the WP page again, thought “nah, still nonsense”, and continued to ignore it.
This continued until I participated in CFAR mentorship training last September, and we had a class on CFAR’s Internal Double Crux (IDC) technique. IDC clicked really well for me, so I started using it a lot and also facilitating it to...
Glad it was of use! :)
This is Part I of the Specificity Sequence
Specificity turns any argument into a game of 3D Chess. Just when it seems like your argument is a clash of two ground armies, you can use your specificity powers to take off and fly all over the conceptual landscape. Fly, I say!

Want to see what a 3D Chess argument looks like? Behold the conversation I had the other day with my friend “Steve”:
Steve: Uber exploits its drivers by paying them too little!
Steve’s statement was a generic one, lacking specific detail. So I shot back with my own generic counterpoint:
Liron: No, job creation is a force for good at any wage. Uber creates increased demand for labor, which drives wages up in the economy as a whole.
You can see I was showing off my mastery of basic economics. This seemed like a good move to me at the time, but...
Nominating this whole sequence. It’s a blast, even if reading it felt very jumpy and stop-and-start. And I love how it’s clearly a self-example. But overall it’s just some really key lessons, taught better than any other place on the internet that I know.
If the thesis in Unlocking the Emotional Brain (UtEB) is even half-right, it may be one of the most important books that I have read. Written by the psychotherapists Bruce Ecker, Robin Ticic and Laurel Hulley, it claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds. Furthermore, if UtEB is correct, it also explains why rationalist techniques such as Internal Double Crux [1 2 3] work.
UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how the world functions and what caused those emotions to occur. The brain then uses those models to guide our future behavior. Emotional issues...
This post discusses something I have found hard to put into words, and helps draw it out for everyone to talk about. Seems very valuable to include in the review.
This post begins the Immoral Mazes sequence. See introduction for an overview of the plan. Before we get to the mazes, we need some background first.
Meditations on Moloch
Consider Scott Alexander’s Meditations on Moloch. I will summarize here.
Therein lie fourteen scenarios where participants can be caught in bad equilibria.
Nominating this whole sequence. I learned a lot from it.
DeepMind released their AlphaStar paper a few days ago, having reached Grandmaster level at the partial-information real-time strategy game StarCraft II over the summer.
This is very impressive, and yet less impressive than it sounds. I used to watch a lot of StarCraft II (I stopped interacting with Blizzard recently because of how they rolled over for China), and over the summer there were many breakdowns of AlphaStar games once players figured out how to identify the accounts.
The impressive part is getting reinforcement learning to work at all in such a vast state space- that took breakthroughs beyond what was necessary to solve Go and beat Atari games. AlphaStar had to have a rich enough set of potential concepts (in the sense that e.g. a convolutional net ends up having concepts of different textures) that it could learn a concept like "construct building P" or "attack unit Q" or "stay out...
This together with Rick's post on the topic really helped me navigate the whole Alphastar thing, and I've been coming back to it a few times to help me figure out how general current ML methods are (I think I disagree a good amount with it, but still think it makes a good number of points).
This is crossposted from the AI Impacts blog.
Artificial intelligence defeated a pair of professional Starcraft II players for the first time in December 2018. Although this was generally regarded as an impressive achievement, it quickly became clear that not everybody was satisfied with how the AI agent, called AlphaStar, interacted with the game, or how its creator, DeepMind, presented it. Many observers complained that, in spite of DeepMind’s claims that it performed at similar speeds to humans, AlphaStar was able to control the game with greater speed and accuracy than any human, and that this was the reason why it prevailed.
Although I think this story is mostly correct, I think it is harder than it looks to compare AlphaStar’s interaction with the game to that of humans, and to determine to what extent this mattered for the outcome of the matches. Merely comparing raw numbers for actions taken per...
This was really useful at the time for helping me orient around the whole "how good are AIs at real-time strategy" thing at the time, and I think is still the post I would refer to the most (together with orthonormal's post, which I also nominated).
Reply to: Decoupling vs Contextualising Norms
Chris Leong, following John Nerst, distinguishes between two alleged discursive norm-sets. Under "decoupling norms", it is understood that claims should be considered in isolation; under "contextualizing norms", it is understood that those making claims should also address potential implications of those claims in context.
I argue that, at best, this is a false dichotomy that fails to clarify the underlying issues—and at worst (through no fault of Leong or Nerst), the concept of "contextualizing norms" has the potential to legitimize derailing discussions for arbitrary political reasons by eliding the key question of which contextual concerns are genuinely relevant, thereby conflating legitimate and illegitimate bids for contextualization.
Real discussions adhere to what we might call "relevance norms": it is almost universally "eminently reasonable to expect certain contextual factors or implications to be addressed." Disputes arise over which certain contextual factors those are, not whether context matters at all.
The
...This post gave specific words to a problem I've run into many times, and am just pretty glad to have words for. It also became relevant in a bunch of contexts I was in.
A few years ago, the rationalsphere was small, and it was hard to get funding to run even one organization. Spinning up a second one with the same focus area might have risked killing the first one.
By now, I think we have the capacity (financial, coordinational and human-talent-wise) that that's less of a risk. Meanwhile, I think there are a number of benefits to having more, better, friendly competition.
Diversity of worldviews is better.
Two research orgs might develop different schools of thought that lead to different insights. This can lead to more ideas as well as avoiding the tail risks of bias and groupthink.
Easier criticism.
When there's only one org doing A Thing, criticizing that org feels sort of like criticizing That Thing. And there may be a worry that if the org lost funding due to your criticism, That Thing wouldn't get done at all. Multiple...
Nominating this post as much for the main body as well as Ray's top-level comment. I guess maybe this post is somewhat downstream of me, so it's not super surprising I like it, but I do think many many parts of the world could really benefit from more healthy competitions, and I've set many plans into motion that try to create more competition in ways that I think improves things quite a bit.
This post covers the set-up and results from our exploration in amplifying generalist research using predictions, in detail. It is accompanied by a second post with a high-level description of the results, and more detailed models of impact and challenges. For an introduction to the project, see that post.
___
The rest of this post is structured as follows.
First, we cover the basic set-up of the exploration.
Second, we share some results, in particular focusing on the accuracy and cost-effectiveness of this method of doing research.
Third, we briefly go through some perspectives on what we were trying to accomplish and why that might be impactful, as well as challenges with this approach. These are covered more in-depth in a separate post.
Overall, we are very interested in feedback and comments on where to take this next.
To begin with, we note that...
I really like amplification and want people to try it more. This was the most serious real-life effort in amplification that I can remember, and while I don't think it's results convinced ended up being super surprising to me, the methodology was quite good, and I would like to see more of it (or somewhat enhanced versions of it)
The following is QRI's unified theory of music, meditation, psychedelics, depression, trauma, and emotional processing. Implications for how the brain implements Bayesian updating, and future directions for neuroscience. Crossposted from http://opentheory.net
-----------------
Context: follow-up to The Neuroscience of Meditation and A Future For Neuroscience; a unification of (1) the Entropic Brain & REBUS (Carhart-Harris et al. 2014; 2018; 2019), (2) the Free Energy Principle (Friston 2010), (3) Connectome-Specific Harmonic Waves (Atasoy et al. 2016; 2017), and (4) QRI’s Symmetry Theory of Valence (Johnson 2016; Gomez Emilsson 2017).
0. Introduction
Why is neuroscience so hard?
Part of the problem is that the brain is complicated. But we’ve also mostly been doing it wrong, trying to explain the brain using methods that couldn’t possibly generate insight about the things we care about.
On QRI’s lineages page, we suggest there’s a distinction between ‘old’ and ‘new’ neuroscience:
Traditionally, neuroscience has been concerned with cataloguing the brain, e.g. collecting discrete observations...
I think this post is 90% likely to make very little sense, but, ever since reading it I can't get rid of the spark of doubt that maybe this post is saying something really important and valuable and all study of rationality that does not understand it is doomed from the start.
I do think even without this post being anywhere close to right I got some useful things out of it, but by far the strongest reason for why I am nominating this post is because I want people to review it and engage with it critically.
Note: I'll be trying not to engage too much with the object level discussion here – I think my marginal time on this topic is better spent thinking and writing longform thoughts. See this comment.
Over the past couple months there was some extended discussion including myself, Habryka, Ruby, Vaniver, Jim Babcock, Zvi, Ben Hoffman, Jessicata and Zack Davis. The discussion has covered many topics, including "what is reasonable to call 'lying'", and "what are the best ways to discuss and/or deal with deceptive patterns in public discourse", "what norms and/or principles should LessWrong aspire to" and others.
This included comments on LessWrong, email, google-docs and in-person communication. This post is intended as an easier-to-read collection of what seemed (to me) like key points, as well as including my current takeaways.
Part of the challenge here was that it seemed like Benquo and I had mostly similar models, but many critiques I made seemed
...I was sadly not part of the conversations involved, but this writeup is pretty helpful and I think important.
This post is based on chapter 15 of Uri Alon’s book An Introduction to Systems Biology: Design Principles of Biological Circuits. See the book for more details and citations; see here for a review of most of the rest of the book.
Fun fact: biological systems are highly modular, at multiple different scales. This can be quantified and verified statistically, e.g. by mapping out protein networks and algorithmically partitioning them into parts, then comparing the connectivity of the parts. It can also be seen more qualitatively in everyday biological work: proteins have subunits which retain their function when fused to other proteins, receptor circuits can be swapped out to make bacteria follow different chemical gradients, manipulating specific genes can turn a fly’s antennae into legs, organs perform specific functions, etc, etc.
On the other hand, systems designed by genetic algorithms (aka simulated evolution) are decidedly not modular. This can also be quantified...
Coming back to this post, I have some thoughts related to it that connect this more directly to AI Alignment that I want to write up, and that I think make this post more important than I initially thought. Hence nominating it for the review.
Originally posted on The Roots of Progress, August 12, 2017
I recently finished The Alchemy of Air, by Thomas Hager. It's the story of the Haber-Bosch process, the lives of the men who created it, and its consequences for world agriculture and for Germany during the World Wars.
What is the Haber-Bosch process? It's what keeps billions of people in the modern world from starving to death. In Hager's phrase: it turns air into bread.
Some background. Plants, like all living organisms, need to take in nutrients for metabolism. For animals, the macronutrients needed are large, complex molecules: proteins, carbohydrates, fats. But for plants they are elements: nitrogen, phosphorus and potassium (NPK). Nitrogen is needed in the largest quantities.
Nitrogen is all around us: it constitutes about four-fifths of the atmosphere. But plants can't use atmospheric nitrogen. Nitrogen gas, , consists of two atoms held together by a triple covalent bond. The strength of
...Seconding johnswentworth nominations. This was I think my favorite post from Jason in 2019, and I still think the study of progress is pretty crucial for a lot of work on LessWrong, and this post does a pretty good job of it.
This essay is an adaptation of a talk I gave at the Human-Aligned AI Summer School 2019 about our work on mesa-optimisation. My goal here is to write an informal, accessible and intuitive introduction to the worry that we describe in our full-length report.
I will skip most of the detailed analysis from our report, and encourage the curious reader to follow up this essay with our sequence or report.
The essay has six parts:
Two distinctions draws the foundational distinctions between
“optimised” and “optimising”, and between utility and reward.
What objectives? discusses the behavioral and internal approaches to understanding objectives of ML systems.
Why worry? outlines the risk posed by the utility ≠ reward gap.
Mesa-optimisers introduces our language for analysing this worry.
An alignment agenda sketches different alignment problems presented by these ideas, and suggests transparency and interpretability as a way to solve them.
Where does this leave us? summarises the essay and suggests where to look
...I think of Utility != Reward as probably the most important core point from the Mesa-Optimizer paper, and I preferred this explanation over the one in the paper (though it leaves out many things and wouldn't want it to be the only thing someone reads on the topic)
Thankyou to Sisi Cheng (of the Working as Intended comic) for the excellent drawings.
Suppose we have a gearbox. On one side is a crank, on the other side is a wheel which spins when the crank is turned. We want to predict the rotation of the wheel given the rotation of the crank, so we run a Kaggle competition.
We collect hundreds of thousands of data points on crank rotation and wheel rotation. 70% are used as training data, the other 30% set aside as test data and kept under lock and key in an old nuclear bunker. Hundreds of teams submit algorithms to predict wheel rotation from crank rotation. Several top teams combine their models into one gradient-boosted deep random neural support vector forest. The model achieves stunning precision and accuracy in predicting wheel rotation.
On the other hand, in a very literal sense, the model contains no gears. Is...
This is just such a central idea we use on LessWrong, explained well and with great images.
(If it is published in the book, it should be included alongside Val's original post on the subject.)
No need to think about editing at this point, we'll sort out all editing issues after the review. (And for this specific issue, all hyperlinks in the books have been turned into readable footnotes, which works out just fine in the vast majority of cases.)