Book 2 of the Sequences Highlights

A major theme of the Sequences is the ways in which human reasoning goes astray. This sample of essays describes a number of failure modes and invokes us to do better.

First Post: The Bottom Line

Recent Discussion

Today, the AI Extinction Statement was released by the Center for AI Safetya one-sentence statement jointly signed by a historic coalition of AI experts, professors, and tech leaders.

Geoffrey Hinton and Yoshua Bengio have signed, as have the CEOs of the major AGI labs–Sam Altman, Demis Hassabis, and Dario Amodei–as well as executives from Microsoft and Google (but notably not Meta).

The statement reads: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

We hope this statement will bring AI x-risk further into the overton window and open up discussion around AI’s most severe risks. Given the growing number of experts and public figures who take risks from advanced AI seriously, we hope to improve epistemics by encouraging discussion and focusing public and international attention toward this issue.

It's a step, likely one that couldn't be skipped. Still just short of actually acknowledging nontrivial probability of AI-caused human extinction, and the distinction between extinction and lesser global risks. Nuclear war can't cause extinction, so it's not properly alongside AI x-risk. Engineered pandemics might eventually get extinction-worthy, but even that real risk is less urgent.

For those who might not have noticed, this actually is historic, they're not just saying that- the top 350 people have effectively "come clean" about this, at once, in a schelling-point kind-of way.  The long years of staying quiet about this and avoiding telling other people your thoughts about AI potentially ending the world, because you're worried that you're crazy or that you take science fiction too seriously- those days might have just ended.  This was a credible signal, none of these 350 high-level people can go back and say "no, I never actually said that AI could cause extinction and AI safety should be a top global priority", and from now on you and anyone else can cite this announcement to back up your views (instead of saying "Bill Gates, Elon Musk, and Stephen Hawking have all endorsed...") and go straight to AI timelines [] (I like sending people Epoch's Literature review []).
5Vishrut Arya6h
Any explanations for why Nick Bostrom has been absent, arguably notably, in recent public alignment conversations (particularly since chatgpt)? He's not on this list (yet other FHI members, like Toby Ord, are). He wasn't on the FLI open letter, too, but I could understand why he might've avoided endorsing that letter given its much wider scope.
Almost certainly related to that email controversy from a few months ago. My sense is people have told him (or he has himself decided) to take a step back from public engagement.  I think I disagree with this, but it's not a totally crazy call, IMO.

Epistemic status: very speculative
Content warning: if true this is pretty depressing

This came to me when thinking about Eliezer's note on Twitter that he didn't think superintelligence could do FTL, partially because of Fermi Paradox issues. I think Eliezer made a mistake, there; superintelligent AI with (light-cone-breaking, as opposed to within-light-cone-of-creation) FTL, if you game it out the whole way, actually mostly solves the Fermi Paradox.

I am, of course, aware that UFAI cannot be the Great Filter in a normal sense; the UFAI itself is a potentially-expanding technological civilisation.

But. If a UFAI is expanding at FTL, then it conquers and optimises the entire universe within a potentially-rather-short timeframe (even potentially a negative timeframe at long distances, if the only cosmic-censorship limit is closing a loop). That means the...

The answer must be "yes", since it's mentioned in the post
2the gears to ascension4h
1Seth Herd5h
Thanks, I hate it. The anthropic argument seems to make sense. The more general version would be: we're observing from what would seem like very early in history if sentience is successful at spreading sentience. Therefore, it's probably not. The remainder of history might have very few observers, like the singleton misaligned superintelligences we and others will spawn. This form doesn't seem to depend on FTL. Yuck. But I wouldn't want to remain willfully ignorant of the arguments, so thanks! Hopefully I'm misunderstanding something about the existing thought on this issue. Corrections are more than welcome.

I think this depends on whether you use SIA or SSA or some other theory of anthropics.

In this post I outline every post I could find that meaningfully connects the concept of «Boundaries/Membranes» (tag, sequence) with AI safety. This seems to be a booming subtopic: interest has picked up substantially within the past year. 

Perhaps most notably, Davidad includes the concept in his Open Agency Architecture for Safe Transformative AI alignment paradigm. For a preview of the salience of this approach, see this comment by Davidad (2023 Jan):

“defend the boundaries of existing sentient beings,” which is my current favourite. It’s nowhere near as ambitious or idiosyncratic as “human values”, yet nowhere near as anti-natural or buck-passing as corrigibility. 

This post also compiles recent work from Andrew Critch, Scott Garrabrant, John Wentworth, and others. But first I will recap what «Boundaries» are:

«Boundaries» definition recap:

You can see «Boundaries» Sequence...

Today I've added a link to my current research questions for «membranes/boundaries», and also fixed the headers

Two and a half years ago, I wrote Extrapolating GPT-N performance, trying to predict how fast scaled-up models would improve on a few benchmarks. One year ago, I added PaLM to the graphs. Another spring has come and gone, and there are new models to add to the graphs: PaLM-2 and GPT-4. (Though I only know GPT-4's performance on a small handful of benchmarks.)

Converting to Chinchilla scaling laws

In previous iterations of the graph, the x-position represented the loss on GPT-3's validation set, and the x-axis was annotated with estimates of size+data that you'd need to achieve that loss according to the Kaplan scaling laws. (When adding PaLM to the graph, I estimated its loss using those same Kaplan scaling laws.)

In these new iterations, the x-position instead represents...

Sigmoids don't accurately extrapolate the scaling behavior(s) of the performance of artificial neural networks. 

Use a Broken Neural Scaling Law (BNSL) in order to obtain accurate extrapolations:

[Written by EJT as part of the CAIS Philosophy Fellowship. Thanks to Dan for help posting to the Alignment Forum]


For about fifteen years, the AI safety community has been discussing coherence arguments. In papers and posts on the subject, it’s often written that there exist 'coherence theorems' which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy. Despite the prominence of these arguments, authors are often a little hazy about exactly which theorems qualify as coherence theorems. This is no accident. If the authors had tried to be precise, they would have discovered that there are no such theorems.

I’m concerned about this. Coherence arguments seem to be a moderately important...

I don't know about you, but I'm actually OK dithering a bit, and going in circles, and doing things that mere entropy can "make me notice regret based on syntactically detectable behavioral signs" (like not even active adversarial optimization pressure like that which is somewhat inevitably generated in predator prey contexts).

For example, in my twenties I formed an intent, and managed to adhere to the habit somewhat often, where I'd flip a coin any time I noticed decisions where the cost to think about it in an explicit way was probably larger than the di... (read more)

People talk about Kelly betting and expectation maximization as though they're alternate strategies for the same problem. Actually, they're each the best option to pick for different classes of problems. Understanding when to use Kelly betting and when to use expectation maximization is critical.

Most of the ideas for this came from Ole Peters ergodicity economics writings. Any mistakes are my own.

The parable of the casino

Alice and Bob visit a casino together. They each have $100, and they decide it'll be fun to split up, play the first game they each find, and then see who has the most money. They'll then keep doing this until their time in the casino is up in a couple days.

Alice heads left and finds a game that looks good. It's double...

Not maximising expected utility means that you expect to get less utility.

This isn't actually right though - the concept of maximizing utility doesn't quite overlap with expecting to have more or less utility at the end.

There are many examples where maximizing your expected utility means expecting to go broke, and not maximizing it means expecting to end up with more money.

(Even though, in this particular one-turn example, Bob should, in fact, expect to end up with more money if he bets everything.)

Ignoring infinities, do you have the same objection to a game with a limit of 100 rounds? Utility-maximizing Bob will bet all his money 100 times, and lose all of it with probability around 1−10−24, and he'll endorse that because one time in 1024 he is raking it in to the tune of 1032 dollars or something. If you try to stop him he'll be justly annoyed because you're not letting him maximize his utility function. Do you think that's a problem for expected utility maximization? If so, it seems to me that your objection isn't "optimal policy doesn't come from optimal actions". (At any rate I think that would be a bad objection, because optimal policy for this utility function does come from optimal actions at each step.) Rather, it seems to me that your objection is you don't really believe Bob has that utility function. Which, of course he doesn't! No one has a utility function like that (or, indeed, at all). And I think that's important to realize. But it's a different objection, and I think that's important to realize too.
Yes, I completely agree that the main reason in real life we would recommend against that strategy is that we instinctively (and usually correctly) feel that the person's utility function is sub-linear in money. So that the 1032  dollars with probability 10−24 is bad. Obviously if  1032  dollars is needed to cure some disease that will otherwise kill them immediately that changes things. But, their is an objection that I think runs somewhat separately to that, which is the round limit. If we are operating under an optimal, reasonable policy, then (outside commitment tactic negotiations) I think it shouldn't really be possible for a new outside constraint to improve our performance. Because if the constraint does improve performance then we could have adopted that constraint voluntarily and our policy was therefore not optimal. And the N-round limit is doing a fairly important job at improving Bob's performance in this hypothetical. Otherwise Bob's strategy is equivalent to "I bet everything, every time, until I loose it all." Perhaps this second objection is just the old one in a new disguise (any agent with a finitely-bounded utility function would eventually reach a round number where they decide "actually I have enough now", and thus restore my sense of what should be), but I am not sure that it is exactly the same.  
I don't think I understand the point of the temporal average. I think I follow how to calculate it, but I don't see any justification here for why we should care about the value we calculate that way, or why it's given that name. (Maybe I just missed these? Maybe they're answered in the paper?) I've written about this myself [], though not recently enough to remember that post in depth. My answer for why to bet Kelly is "over a long enough time, you’ll almost certainly get more money than someone else who was offered the same bets as you and started with the same amount of money but regularly bet different amounts on them". I happen to know that in this type of game, maximizing temporal average is the way to get that property, which is neat. That's the justification I'd give for doing that calculation in this type of game. But it's not clear to me what justification you'd give.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
Log In Reset Password
...or continue with

Short version: Sentient lives matter; AIs can be people and people shouldn't be owned (and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff).

Context: Writing up obvious points that I find myself repeating.

Stating the obvious:

  • All sentient lives matter.

    • Yes, including animals, insofar as they're sentient (which is possible in at least some cases).
    • Yes, including AIs, insofar as they're sentient (which is possible in at least some cases).
    • Yes, even including sufficiently-detailed models of sentient creatures (as I suspect could occur frequently inside future AIs). (People often forget this one.)
  • There's some ability-to-feel-things that humans surely have, and that cartoon drawings don't have, even if the cartoons make similar facial


I don't think 'tautology' fits. There are some people who would draw the line somewhere else even if they were convinced of sentience. Some people might be convinced that only humans should be included, or maybe biological beings, or some other category of entities that is not fully defined by mental properties. I guess 'moral patient' is kind of equivalent to 'sentient' but I think this mostly tells us something about philosophers agreeing that sentience is the proper marker for moral relevance.

(updated the previous comment with some clearer context-setting)
3Nathan Helm-Burger2h
That's an interesting way of reframing the issue. I'm honestly just not sure about all of this reasoning, and remain so after trying to think about it with your reframing, but I feel like this does shift my thinking a bit. Thanks. I think probably it makes sense to try reasoning both with and without tradeoffs, and then comparing the results.
I share this preference, but one of the confusions is whether our AI systems (and their impending successors) are moral patients. Which is a fact about AI systems and moral patienthood, and isn't influenced by our hopes for it being true or not.
This is a linkpost for


Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention). In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to

This is my thought exactly. I would try it, but I am poor and don't even have a GPU lol. This is something I'd love to see tested.

Hah yeah I'm not exactly loaded, it's pretty much all colab notebooks for me.

The Selfish Gene remains one of my favorite books of all time and although published in 1976 it remains a compelling and insightful introduction to the brute mechanics of natural selection. Richard Dawkins later acknowledged that his book’s title may give a misleading impression of its thesis, erroneously ascribing conscious motivations or agentic properties to non-sentient strands of DNA. The core argument is dreadfully simple: genes (as opposed to organisms or species) are the primary unit of natural selection, and if you leave the primordial soup brewing for a while, the only genes that can remain are the ones with a higher proclivity towards replication than their neighbors. “Selfish” genes therefore are not cunning strategists or followers of some manifest destiny, but rather simply the accidental consequence...

Because it's connected with a lot of other things? It's not that cultural morality is always inimical to the individual .. and it's not that cultural morality is cleanly split off from cultural Everything Else. Most people have learnt how to make war, how to make families and how to make bread as part of a single package.

I acknowledge that you believe this is not worth your time and do not hold you to a response. That said it's generally not helpful to assert a criticism and then refuse elaboration. Outsourcing that task only serves to highlight how deficient LLMs currently are at this task because of how distracted they get with ethical guardrails. Point no. 3 is exemplary of this problem because I already said that energy slaves are a crude comparison and already said I don't support slavery. So this criticism essentially boils down to not having enough throat-clearing negative adjectives directly adjacent to any discussion of slavery.
2the gears to ascension1h
My claim is that the boring ethical guardrails are in fact what I endorse as describing the errors you made. jimrandomh's response is a higher quality version.
No, my argument is that morality is whatever replicates best. Often it means "doing what is good for the community" because a healthy community is in a better place to replicate its guiding values. But not always.