| 0 | AGI Ruin: A List of Lethalities
        Eliezer Yudkowsky | 
          
         | 
      | 1 | MIRI announces new "Death With Dignity" strategy
        Eliezer Yudkowsky | 
          
         | 
      | 2 | Where I agree and disagree with Eliezer
        paulfchristiano | 
          
         | 
      | 3 | Let’s think about slowing down AI
        KatjaGrace | 
          
         | 
      | 4 | Reward is not the optimization target
        TurnTrout | 
          
         | 
      | 5 | Six Dimensions of Operational Adequacy in AGI Projects
        Eliezer Yudkowsky | 
          
         | 
      | 6 | It Looks Like You're Trying To Take Over The World
        gwern | 
          
         | 
      | 7 | Staring into the abyss as a core life skill
        benkuhn | 
          
         | 
      | 8 | You Are Not Measuring What You Think You Are Measuring
        johnswentworth | 
          
         | 
      | 9 | Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
        Ajeya Cotra | 
          
         | 
      | 10 | Sazen
        [DEACTIVATED] Duncan Sabien | 
          
         | 
      | 11 | Luck based medicine: my resentful story of becoming a medical miracle
        Elizabeth | 
          
         | 
      | 12 | Inner and outer alignment decompose one hard problem into two extremely hard problems
        TurnTrout | 
          
         | 
      | 13 | On how various plans miss the hard bits of the alignment challenge
        So8res | 
          
         | 
      | 14 | Simulators
        janus | 
          
         | 
      | 15 | Epistemic Legibility
        Elizabeth | 
          
         | 
      | 16 | Tyranny of the Epistemic Majority
        Scott Garrabrant | 
          
         | 
      | 17 | Counterarguments to the basic AI x-risk case
        KatjaGrace | 
          
         | 
      | 18 | What Are You Tracking In Your Head?
        johnswentworth | 
          
         | 
      | 19 | Safetywashing
        Adam Scholl | 
          
         | 
      | 20 | Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
        Diffractor | 
          
         | 
      | 21 | Nonprofit Boards are Weird
        HoldenKarnofsky | 
          
         | 
      | 22 | Optimality is the tiger, and agents are its teeth
        Veedrac | 
          
         | 
      | 23 | chinchilla's wild implications
        nostalgebraist | 
          
         | 
      | 24 | Losing the root for the tree
        Adam Zerner | 
          
         | 
      | 25 | Worlds Where Iterative Design Fails
        johnswentworth | 
          
         | 
      | 26 | Decision theory does not imply that we get to have nice things
        So8res | 
          
         | 
      | 27 | Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"
        AnnaSalamon | 
          
         | 
      | 28 | What an actually pessimistic containment strategy looks like
        lc | 
          
         | 
      | 29 | Introduction to abstract entropy
        Alex_Altair | 
          
         | 
      | 30 | A Mechanistic Interpretability Analysis of Grokking
        Neel Nanda | 
          
         | 
      | 31 | The Redaction Machine
        Ben | 
          
         | 
      | 32 | Butterfly Ideas
        Elizabeth | 
          
         | 
      | 33 | Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
        LawrenceC | 
          
         | 
      | 34 | Language models seem to be much better than humans at next-token prediction
        Buck | 
          
         | 
      | 35 | Toni Kurz and the Insanity of Climbing Mountains
        GeneSmith | 
          
         | 
      | 36 | Useful Vices for Wicked Problems
        HoldenKarnofsky | 
          
         | 
      | 37 | What should you change in response to an "emergency"? And AI risk
        AnnaSalamon | 
          
         | 
      | 38 | Models Don't "Get Reward"
        Sam Ringer | 
          
         | 
      | 39 | How To Go From Interpretability To Alignment: Just Retarget The Search
        johnswentworth | 
          
         | 
      | 40 | Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
        elspood | 
          
         | 
      | 41 | Why Agent Foundations? An Overly Abstract Explanation
        johnswentworth | 
          
         | 
      | 42 | A central AI alignment problem: capabilities generalization, and the sharp left turn
        So8res | 
          
         | 
      | 43 | Humans provide an untapped wealth of evidence about alignment
        TurnTrout | 
          
         | 
      | 44 | Learning By Writing
        HoldenKarnofsky | 
          
         | 
      | 45 | Limerence Messes Up Your Rationality Real Bad, Yo
        Raemon | 
          
         | 
      | 46 | The Onion Test for Personal and Institutional Honesty
        chanamessinger | 
          
         | 
      | 47 | Counter-theses on Sleep
        Natália Coelho Mendonça | 
          
         | 
      | 48 | The shard theory of human values
        Quintin Pope | 
          
         | 
      | 49 | How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
        Collin | 
          
         | 
      | 50 | ProjectLawful.com: Eliezer's latest story, past 1M words
        Eliezer Yudkowsky | 
          
         | 
      | 51 | Intro to Naturalism: Orientation
        LoganStrohl | 
          
         | 
      | 52 | Why I think strong general AI is coming soon
        porby | 
          
         | 
      | 53 | How might we align transformative AI if it’s developed very soon?
        HoldenKarnofsky | 
          
         | 
      | 54 | It’s Probably Not Lithium
        Natália Coelho Mendonça | 
          
         | 
      | 55 | (My understanding of) What Everyone in Technical Alignment is Doing and Why
        Thomas Larsen | 
          
         | 
      | 56 | Plans Are Predictions, Not Optimization Targets
        johnswentworth | 
          
         | 
      | 57 | Takeoff speeds have a huge effect on what it means to work on AI x-risk
        Buck | 
          
         | 
      | 58 | The Feeling of Idea Scarcity
        johnswentworth | 
          
         | 
      | 59 | Six (and a half) intuitions for KL divergence
        CallumMcDougall | 
          
         | 
      | 60 | Trigger-Action Planning
        CFAR!Duncan | 
          
         | 
      | 61 | Have You Tried Hiring People?
        rank-biserial | 
          
         | 
      | 62 | The Wicked Problem Experience
        HoldenKarnofsky | 
          
         | 
      | 63 | What does it take to defend the world against out-of-control AGIs?
        Steven Byrnes | 
          
         | 
      | 64 | On Bounded Distrust
        Zvi | 
          
         | 
      | 65 | Setting the Zero Point
        [DEACTIVATED] Duncan Sabien | 
          
         | 
      | 66 | [Interim research report] Taking features out of superposition with sparse autoencoders
        Lee Sharkey | 
          
         | 
      | 67 | Limits to Legibility
        Jan_Kulveit | 
          
         | 
      | 68 | Harms and possibilities of schooling
        TsviBT | 
          
         | 
      | 69 | Look For Principles Which Will Carry Over To The Next Paradigm
        johnswentworth | 
          
         | 
      | 70 | Steam
        abramdemski | 
          
         | 
      | 71 | High Reliability Orgs, and AI Companies
        Raemon | 
          
         | 
      | 72 | Toy Models of Superposition
        evhub | 
          
         | 
      | 73 | Editing Advice for LessWrong Users
        JustisMills | 
          
         | 
      | 74 | Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
        johnswentworth | 
          
         | 
      | 75 | why assume AGIs will optimize for fixed goals?
        nostalgebraist | 
          
         | 
      | 76 | Lies Told To Children
        Eliezer Yudkowsky | 
          
         | 
      | 77 | Revisiting algorithmic progress
        Tamay | 
          
         | 
      | 78 | Things that can kill you quickly: What everyone should know about first aid
        jasoncrawford | 
          
         | 
      | 79 | Postmortem on DIY Recombinant Covid Vaccine
        caffemacchiavelli | 
          
         | 
      | 80 | Reflections on six months of fatherhood
        jasoncrawford | 
          
         | 
      | 81 | Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
        KevinRoWang | 
          
         | 
      | 82 | The Plan - 2022 Update
        johnswentworth | 
          
         | 
      | 83 | 12 interesting things I learned studying the discovery of nature's laws
        Ben Pace | 
          
         | 
      | 84 | Impossibility results for unbounded utilities
        paulfchristiano | 
          
         | 
      | 85 | Searching for outliers
        benkuhn | 
          
         | 
      | 86 | Greyed Out Options
        ozymandias | 
          
         | 
      | 87 | “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments
        Andrew_Critch | 
          
         | 
      | 88 | Do bamboos set themselves on fire?
        Malmesbury | 
          
         | 
      | 89 | Murphyjitsu: an Inner Simulator algorithm
        CFAR!Duncan | 
          
         | 
      | 90 | Deliberate Grieving
        Raemon | 
          
         | 
      | 91 | We Choose To Align AI
        johnswentworth | 
          
         | 
      | 92 | The alignment problem from a deep learning perspective
        Richard_Ngo | 
          
         | 
      | 93 | Slack matters more than any outcome
        Valentine | 
          
         | 
      | 94 | Consider your appetite for disagreements
        Adam Zerner | 
          
         | 
      | 95 | everything is okay
        Tamsin Leake | 
          
         | 
      | 96 | Mysteries of mode collapse
        janus | 
          
         | 
      | 97 | Slow motion videos as AI risk intuition pumps
        Andrew_Critch | 
          
         | 
      | 98 | ITT-passing and civility are good; "charity" is bad; steelmanning is niche
        Rob Bensinger | 
          
         | 
      | 99 | Meadow Theory
        [DEACTIVATED] Duncan Sabien | 
          
         | 
      | 100 | The next decades might be wild
        Marius Hobbhahn | 
          
         | 
      | 101 | Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
        Jeffrey Ladish | 
          
         | 
      | 102 | Lessons learned from talking to >100 academics about AI safety
        Marius Hobbhahn | 
          
         | 
      | 103 | Activated Charcoal for Hangover Prevention: Way more than you wanted to know
        Maxwell Peterson | 
          
         | 
      | 104 | More Is Different for AI
        jsteinhardt | 
          
         | 
      | 105 | How satisfied should you expect to be with your partner?
        Vaniver | 
          
         | 
      | 106 | How my team at Lightcone sometimes gets stuff done
        jacobjacob | 
          
         | 
      | 107 | The metaphor you want is "color blindness," not "blind spot."
        [DEACTIVATED] Duncan Sabien | 
          
         | 
      | 108 | Logical induction for software engineers
        Alex Flint | 
          
         | 
      | 109 | Call For Distillers
        johnswentworth | 
          
         | 
      | 110 | Fiber arts, mysterious dodecahedrons, and waiting on “Eureka!”
        eukaryote | 
          
         | 
      | 111 | A Longlist of Theories of Impact for Interpretability
        Neel Nanda | 
          
         | 
      | 112 | On A List of Lethalities
        Zvi | 
          
         | 
      | 113 | LOVE in a simbox is all you need
        jacob_cannell | 
          
         | 
      | 114 | A transparency and interpretability tech tree
        evhub | 
          
         | 
      | 115 | DeepMind alignment team opinions on AGI ruin arguments
        Vika | 
          
         | 
      | 116 | Contra shard theory, in the context of the diamond maximizer problem
        So8res | 
          
         | 
      | 117 | On infinite ethics
        Joe Carlsmith | 
          
         | 
      | 118 | Wisdom Cannot Be Unzipped
        Sable | 
          
         | 
      | 119 | Different perspectives on concept extrapolation
        Stuart_Armstrong | 
          
         | 
      | 120 | Utilitarianism Meets Egalitarianism
        Scott Garrabrant | 
          
         | 
      | 121 | The ignorance of normative realism bot
        Joe Carlsmith | 
          
         | 
      | 122 | Shah and Yudkowsky on alignment failures
        Rohin Shah | 
          
         | 
      | 123 | Nuclear Energy - Good but not the silver bullet we were hoping for
        Marius Hobbhahn | 
          
         | 
      | 124 | Patient Observation
        LoganStrohl | 
          
         | 
      | 125 | Monks of Magnitude
        [DEACTIVATED] Duncan Sabien | 
          
         | 
      | 126 | AI coordination needs clear wins
        evhub | 
          
         | 
      | 127 | Actually, All Nuclear Famine Papers are Bunk
        Lao Mein | 
          
         | 
      | 128 | New Frontiers in Mojibake
        Adam Scherlis | 
          
         | 
      | 129 | My take on Jacob Cannell’s take on AGI safety
        Steven Byrnes | 
          
         | 
      | 130 | Introducing Pastcasting: A tool for forecasting practice
        Sage Future | 
          
         | 
      | 131 | K-complexity is silly; use cross-entropy instead
        So8res | 
          
         | 
      | 132 | Beware boasting about non-existent forecasting track records
        Jotto999 | 
          
         | 
      | 133 | Clarifying AI X-risk
        zac_kenton | 
          
         | 
      | 134 | Narrative Syncing
        AnnaSalamon | 
          
         | 
      | 135 | publishing alignment research and exfohazards
        Tamsin Leake | 
          
         | 
      | 136 | Deontology and virtue ethics as "effective theories" of consequentialist ethics 
        Jan_Kulveit | 
          
         | 
      | 137 | Range and Forecasting Accuracy
        niplav | 
          
         | 
      | 138 | Trends in GPU price-performance
        Marius Hobbhahn | 
          
         | 
      | 139 | How To Observe Abstract Objects
        LoganStrohl | 
          
         | 
      | 140 | Criticism of EA Criticism Contest
        Zvi | 
          
         | 
      | 141 | Takeaways from our robust injury classifier project [Redwood Research]
        dmz | 
          
         | 
      | 142 | Bad at Arithmetic, Promising at Math
        cohenmacaulay | 
          
         | 
      | 143 | Don't use 'infohazard' for collectively destructive info
        Eliezer Yudkowsky | 
          
         | 
      | 144 | Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection
        Oliver Sourbut | 
          
         | 
      | 145 | Human values & biases are inaccessible to the genome
        TurnTrout | 
          
         | 
      | 146 | I learn better when I frame learning as Vengeance for losses incurred through ignorance, and you might too
        chaosmage | 
          
         | 
      | 147 | Jailbreaking ChatGPT on Release Day
        Zvi | 
          
         | 
      | 148 | Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
        Andrew_Critch | 
          
         | 
      | 149 | Review: Amusing Ourselves to Death
        L Rudolf L | 
          
         | 
      | 150 | QNR prospects are important for AI alignment research
        Eric Drexler | 
          
         | 
      | 151 | Disagreement with bio anchors that lead to shorter timelines
        Marius Hobbhahn | 
          
         | 
      | 152 | Why all the fuss about recursive self-improvement?
        So8res | 
          
         | 
      | 153 | LessWrong Has Agree/Disagree Voting On All New Comment Threads
        Ben Pace | 
          
         | 
      | 154 | Opening Session Tips & Advice
        CFAR!Duncan | 
          
         | 
      | 155 | Searching for Search
        NicholasKees | 
          
         | 
      | 156 | Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
        Vika | 
          
         | 
      | 157 | Takeaways from a survey on AI alignment resources
        DanielFilan | 
          
         | 
      | 158 | Trying to disambiguate different questions about whether RLHF is “good”
        Buck | 
          
         | 
      | 159 | Benign Boundary Violations
        [DEACTIVATED] Duncan Sabien | 
          
         | 
      | 160 | How To: A Workshop (or anything)
        [DEACTIVATED] Duncan Sabien | 
          
         |