LESSWRONGThe Best of LessWrong
LW

The Best of LessWrong

Here you can find the best posts of LessWrong. When posts turn more than a year old, the LessWrong community reviews and votes on how well they have stood the test of time. These are the posts that have ranked the highest for all years since 2018 (when our annual tradition of choosing the least wrong of LessWrong began).

For the years 2018, 2019 and 2020 we also published physical books with the results of our annual vote, which you can buy and learn more about here.

Sort by:

curatedyear

+

2022

Eliezer Yudkowsky

AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

MIRI announces new "Death With Dignity" strategy

paulfchristiano

Where I agree and disagree with Eliezer

Inner and outer alignment decompose one hard problem into two extremely hard problems

On how various plans miss the hard bits of the alignment challenge

Epistemic Legibility

Scott Garrabrant

Tyranny of the Epistemic Majority

Counterarguments to the basic AI x-risk case

Let’s think about slowing down AI

Reward is not the optimization target

Eliezer Yudkowsky

Six Dimensions of Operational Adequacy in AGI Projects

What Are You Tracking In Your Head?

Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

HoldenKarnofsky

Nonprofit Boards are Weird

Optimality is the tiger, and agents are its teeth

chinchilla's wild implications

It Looks Like You're Trying To Take Over The World

Staring into the abyss as a core life skill

You Are Not Measuring What You Think You Are Measuring

Losing the root for the tree

Worlds Where Iterative Design Fails

Decision theory does not imply that we get to have nice things

Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"

What an actually pessimistic containment strategy looks like

Introduction to abstract entropy

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

[DEACTIVATED] Duncan Sabien

Luck based medicine: my resentful story of becoming a medical miracle

A Mechanistic Interpretability Analysis of Grokking

The Redaction Machine

Butterfly Ideas

Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

Language models seem to be much better than humans at next-token prediction

Toni Kurz and the Insanity of Climbing Mountains

HoldenKarnofsky

Useful Vices for Wicked Problems

What should you change in response to an "emergency"? And AI risk

Models Don't "Get Reward"

How To Go From Interpretability To Alignment: Just Retarget The Search

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

Why Agent Foundations? An Overly Abstract Explanation

A central AI alignment problem: capabilities generalization, and the sharp left turn

Humans provide an untapped wealth of evidence about alignment

HoldenKarnofsky

Learning By Writing

Limerence Messes Up Your Rationality Real Bad, Yo

The Onion Test for Personal and Institutional Honesty

Counter-theses on Sleep

The shard theory of human values

How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Eliezer Yudkowsky

ProjectLawful.com: Eliezer's latest story, past 1M words

+

2021

How factories were made safe

Cryonics signup guide #1: Overview

Strong Evidence is Common

“PR” is corrosive; “reputation” is not.

Eliezer Yudkowsky

Your Cheerful Price

Daniel Kokotajlo

Taboo "Outside View"

HoldenKarnofsky

All Possible Views About Humanity's Future Are Wild

paulfchristiano

Another (outer) alignment failure story

[DEACTIVATED] Duncan Sabien

Split and Commit

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

There’s no such thing as a tree (phylogenetically)

paulfchristiano

ARC's first technical report: Eliciting Latent Knowledge

HoldenKarnofsky

This Can't Go On

Rationalism before the Sequences

Scott Alexander

Trapped Priors As A Basic Problem Of Rationality

Scott Garrabrant

Finite Factored Sets

Selection Theorems: A Program For Understanding Agents

Slack Has Positive Externalities For Groups

paulfchristiano

My research methodology

[DEACTIVATED] Duncan Sabien

Lies, Damn Lies, and Fabricated Options

Daniel Kokotajlo

Fun with +12 OOMs of Compute

Daniel Kokotajlo

What 2026 looks like

The Rationalists of the 1950s (and before) also called themselves “Rationalists”

[DEACTIVATED] Duncan Sabien

Ruling Out Everything Else

Leaky Delegation: You are not a Commodity

Feature Selection

[DEACTIVATED] Duncan Sabien

Cup-Stacking Skills (or, Reflexive Involuntary Mental Motions)

larger language models may disappoint you [or, an eternally unfinished draft]

Eliezer Yudkowsky

Ngo and Yudkowsky on alignment difficulty

How To Write Quickly While Maintaining Epistemic Rigor

Science in a High-Dimensional World

Eliezer Yudkowsky

Self-Integrity and the Drowning Child

Comments on Carlsmith's “Is power-seeking AI an existential risk?”

Working With Monsters

Simulacrum 3 As Stag-Hunt Strategy

Elephant seal 2

EfficientZero: How It Works

Lars Doucet's Georgism series on Astral Codex Ten

Catching the Spark

Specializing in Problems We Don't Understand

[DEACTIVATED] Duncan Sabien

Shoulder Advisors 101

Notes from "Don't Shoot the Dog"

Why has nuclear power been a flop?

+

2020

The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

Coordination as a Scarce Resource

paulfchristiano

Inaccessible information

Daniel Kokotajlo

Cortés, Pizarro, and Afonso as Precedents for Takeover

My computational framework for the brain

Inner Alignment: Explain like I'm 12 Edition

Draft report on AI timelines

An overview of 11 proposals for building safe advanced AI

When Money Is Abundant, Knowledge Is The Real Wealth

Daniel Kokotajlo

Against GDP as a metric for timelines and takeoff speeds

Anti-Aging: State of the Art

Interfaces as a Scarce Resource

Most Prisoner's Dilemmas are Stag Hunts; Most Stag Hunts are Schelling Problems

An Orthodox Case Against Utility Functions

Is Success the Enemy of Freedom? (Full)

microCOVID.org: A tool to estimate COVID risk from common activities

Alignment By Default

The Solomonoff Prior is Malign

Introduction To The Infra-Bayesianism Sequence

Radical Probabilism

Reality-Revealing and Reality-Masking Puzzles

Why haven't we celebrated any major achievements lately?

Some AI research areas and their relevance to existential safety

Search versus design

Jacob Falkovich

Seeing the Smoke

Pain is not the unit of Effort

The ground of optimization

"Can you keep this confidential? How do you know?"

Discontinuous progress in history: an update

Jacob Falkovich

The Treacherous Path to Rationality

How uniform is the neocortex?

To listen well, get curious

Motive Ambiguity

Simulacra Levels and their Interactions

What Money Cannot Buy

AGI safety from first principles: Introduction

The Felt Sense: What, Why and How

The First Sample Gives the Most Information

Nuclear war is unlikely to cause human extinction

Can crimes be discussed literally?

Swiss Political System: More than You ever Wanted to Know (I.)

Scott Alexander

Studies On Slack

Transportation as a Constraint

Daniel Kokotajlo

The date of AI Takeover is not the day the AI takes over

HoldenKarnofsky

Reply to Eliezer on Biological Anchors

Eliezer Yudkowsky

Biology-Inspired AGI Timelines: The Trick That Never Works

[DEACTIVATED] Duncan Sabien

CFAR Participant Handbook now available to all

+

2019

paulfchristiano

What failure looks like

Risks from Learned Optimization: Introduction

The Parable of Predict-O-Matic

Noticing Frame Differences

Scott Garrabrant

Yes Requires the Possibility of No

"Other people are wrong" vs "I am right"

Rest Days vs Recovery Days

Seeking Power is Often Convergently Instrumental in MDPs

Chris Olah’s views on AGI safety

Being the (Pareto) Best in the World

Scott Alexander

Book Review: The Secret Of Our Success

Scott Alexander

Rule Thinkers In, Not Out

Reframing Superintelligence: Comprehensive AI Services as General Intelligence

paulfchristiano

The strategy-stealing assumption

Reframing Impact

Understanding “Deep Double Descent”

Moloch Hasn’t Won

Integrity and accountability are core parts of rationality

Book summary: Unlocking the Emotional Brain

Asymmetric Justice

Heads I Win, Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists

Gears-Level Models are Capital Investments

[DEACTIVATED] Duncan Sabien

Forum participation as a research strategy

Unconscious Economics

Mistakes with Conservation of Expected Evidence

Selection vs Control

You Get About Five Words

The Schelling Choice is "Rabbit", not "Stag"

Excerpts from a larger discussion about simulacra

human psycholinguists: a critical appraisal

AI Safety "Success Stories"

Do you fear the rock or the hard place?

Propagating Facts into Aesthetics

Gradient hacking

The Amish, and Strategic Norms around Technology

Power Buys You Distance From The Crime

Paper-Reading for Gears

How to Ignore Your Emotions (while also thinking you're awesome at emotions)

The Real Rules Have No Exceptions

Eliezer Yudkowsky

Coherent decisions imply consistent utilities

Alignment Research Field Guide

The Curse Of The Counterfactual

The Credit Assignment Problem

Reason isn't magic

Scott Alexander

Mental Mountains

Simple Rules of Law

Jacob Falkovich

Is Rationalist Self-Improvement Real?

Literature Review: Distributed Teams

Steelmanning Divination

Book Review: Design Principles of Biological Circuits

Building up to an Internal Family Systems model

Evolution of Modularity

[Answer] Why wasn't science invented in China?

Gears vs Behavior

+

2018

Prediction Markets: When Do They Work?

Coherence arguments do not entail goal-directed behavior

Scott Alexander

Is Science Slowing Down?

Embedded Agents

Eliezer Yudkowsky

The Rocket Alignment Problem

Eliezer Yudkowsky

Local Validity as a Key to Sanity and Civilization

Scott Garrabrant

Robustness to Scale

A voting theory primer for rationalists

Eliezer Yudkowsky

Toolbox-thinking and Law-thinking

A Sketch of Good Communication

Paul's research agenda FAQ

An Untrollable Mathematician Illustrated

paulfchristiano

Arguments about fast takeoff

The Costly Coordination Mechanism of Common Knowledge

Anti-social Punishment

Scott Alexander

Varieties Of Argumentative Experience

Specification gaming examples in AI

Eliezer Yudkowsky

Meta-Honesty: Firming Up Honesty Around Its Edge-Cases

My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms

sarahconstantin

Naming the Nameless

Inadequate Equilibria vs. Governance of the Commons

Scott Alexander

The Tails Coming Apart As Metaphor For Life

Noticing the Taste of Lotus

sarahconstantin

The Pavlov Strategy

Being a Robust Agent

Spaghetti Towers

Beyond Astronomical Waste

Research: Rescuers during the Holocaust

The Loudest Alarm Is Probably False

The Intelligent Social Web

paulfchristiano

Open question: are minimal circuits daemon-free?

On the Loss and Preservation of Knowledge

Eliezer Yudkowsky

Is Clickbait Destroying Our General Intelligence?

What makes people intellectually active?

Why did everything take so long?

Eliezer Yudkowsky

Challenges to Christiano’s capability amplification proposal

Historical mathematicians exhibit a birth order effect too

Towards a New Impact Measure

Birth order effect found in Nobel Laureates in Physics