| 0 |
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
|
|
| 1 |
MIRI announces new "Death With Dignity" strategy
Eliezer Yudkowsky
|
|
| 2 |
Where I agree and disagree with Eliezer
paulfchristiano
|
|
| 3 |
Let’s think about slowing down AI
KatjaGrace
|
|
| 4 |
Reward is not the optimization target
TurnTrout
|
|
| 5 |
Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky
|
|
| 6 |
It Looks Like You're Trying To Take Over The World
gwern
|
|
| 7 |
Staring into the abyss as a core life skill
benkuhn
|
|
| 8 |
You Are Not Measuring What You Think You Are Measuring
johnswentworth
|
|
| 9 |
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
|
|
| 10 |
Sazen
[DEACTIVATED] Duncan Sabien
|
|
| 11 |
Luck based medicine: my resentful story of becoming a medical miracle
Elizabeth
|
|
| 12 |
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
|
|
| 13 |
On how various plans miss the hard bits of the alignment challenge
So8res
|
|
| 14 |
Simulators
janus
|
|
| 15 |
Epistemic Legibility
Elizabeth
|
|
| 16 |
Tyranny of the Epistemic Majority
Scott Garrabrant
|
|
| 17 |
Counterarguments to the basic AI x-risk case
KatjaGrace
|
|
| 18 |
What Are You Tracking In Your Head?
johnswentworth
|
|
| 19 |
Safetywashing
Adam Scholl
|
|
| 20 |
Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor
|
|
| 21 |
Nonprofit Boards are Weird
HoldenKarnofsky
|
|
| 22 |
Optimality is the tiger, and agents are its teeth
Veedrac
|
|
| 23 |
chinchilla's wild implications
nostalgebraist
|
|
| 24 |
Losing the root for the tree
Adam Zerner
|
|
| 25 |
Worlds Where Iterative Design Fails
johnswentworth
|
|
| 26 |
Decision theory does not imply that we get to have nice things
So8res
|
|
| 27 |
Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"
AnnaSalamon
|
|
| 28 |
What an actually pessimistic containment strategy looks like
lc
|
|
| 29 |
Introduction to abstract entropy
Alex_Altair
|
|
| 30 |
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
|
|
| 31 |
The Redaction Machine
Ben
|
|
| 32 |
Butterfly Ideas
Elizabeth
|
|
| 33 |
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
|
|
| 34 |
Language models seem to be much better than humans at next-token prediction
Buck
|
|
| 35 |
Toni Kurz and the Insanity of Climbing Mountains
GeneSmith
|
|
| 36 |
Useful Vices for Wicked Problems
HoldenKarnofsky
|
|
| 37 |
What should you change in response to an "emergency"? And AI risk
AnnaSalamon
|
|
| 38 |
Models Don't "Get Reward"
Sam Ringer
|
|
| 39 |
How To Go From Interpretability To Alignment: Just Retarget The Search
johnswentworth
|
|
| 40 |
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
elspood
|
|
| 41 |
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
|
|
| 42 |
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
|
|
| 43 |
Humans provide an untapped wealth of evidence about alignment
TurnTrout
|
|
| 44 |
Learning By Writing
HoldenKarnofsky
|
|
| 45 |
Limerence Messes Up Your Rationality Real Bad, Yo
Raemon
|
|
| 46 |
The Onion Test for Personal and Institutional Honesty
chanamessinger
|
|
| 47 |
Counter-theses on Sleep
Natália Coelho Mendonça
|
|
| 48 |
The shard theory of human values
Quintin Pope
|
|
| 49 |
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
|
|
| 50 |
ProjectLawful.com: Eliezer's latest story, past 1M words
Eliezer Yudkowsky
|
|
| 51 |
Intro to Naturalism: Orientation
LoganStrohl
|
|
| 52 |
Why I think strong general AI is coming soon
porby
|
|
| 53 |
How might we align transformative AI if it’s developed very soon?
HoldenKarnofsky
|
|
| 54 |
It’s Probably Not Lithium
Natália Coelho Mendonça
|
|
| 55 |
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
|
|
| 56 |
Plans Are Predictions, Not Optimization Targets
johnswentworth
|
|
| 57 |
Takeoff speeds have a huge effect on what it means to work on AI x-risk
Buck
|
|
| 58 |
The Feeling of Idea Scarcity
johnswentworth
|
|
| 59 |
Six (and a half) intuitions for KL divergence
CallumMcDougall
|
|
| 60 |
Trigger-Action Planning
CFAR!Duncan
|
|
| 61 |
Have You Tried Hiring People?
rank-biserial
|
|
| 62 |
The Wicked Problem Experience
HoldenKarnofsky
|
|
| 63 |
What does it take to defend the world against out-of-control AGIs?
Steven Byrnes
|
|
| 64 |
On Bounded Distrust
Zvi
|
|
| 65 |
Setting the Zero Point
[DEACTIVATED] Duncan Sabien
|
|
| 66 |
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey
|
|
| 67 |
Limits to Legibility
Jan_Kulveit
|
|
| 68 |
Harms and possibilities of schooling
TsviBT
|
|
| 69 |
Look For Principles Which Will Carry Over To The Next Paradigm
johnswentworth
|
|
| 70 |
Steam
abramdemski
|
|
| 71 |
High Reliability Orgs, and AI Companies
Raemon
|
|
| 72 |
Toy Models of Superposition
evhub
|
|
| 73 |
Editing Advice for LessWrong Users
JustisMills
|
|
| 74 |
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
|
|
| 75 |
why assume AGIs will optimize for fixed goals?
nostalgebraist
|
|
| 76 |
Lies Told To Children
Eliezer Yudkowsky
|
|
| 77 |
Revisiting algorithmic progress
Tamay
|
|
| 78 |
Things that can kill you quickly: What everyone should know about first aid
jasoncrawford
|
|
| 79 |
Postmortem on DIY Recombinant Covid Vaccine
caffemacchiavelli
|
|
| 80 |
Reflections on six months of fatherhood
jasoncrawford
|
|
| 81 |
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
|
|
| 82 |
The Plan - 2022 Update
johnswentworth
|
|
| 83 |
12 interesting things I learned studying the discovery of nature's laws
Ben Pace
|
|
| 84 |
Impossibility results for unbounded utilities
paulfchristiano
|
|
| 85 |
Searching for outliers
benkuhn
|
|
| 86 |
Greyed Out Options
ozymandias
|
|
| 87 |
“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments
Andrew_Critch
|
|
| 88 |
Do bamboos set themselves on fire?
Malmesbury
|
|
| 89 |
Murphyjitsu: an Inner Simulator algorithm
CFAR!Duncan
|
|
| 90 |
Deliberate Grieving
Raemon
|
|
| 91 |
We Choose To Align AI
johnswentworth
|
|
| 92 |
The alignment problem from a deep learning perspective
Richard_Ngo
|
|
| 93 |
Slack matters more than any outcome
Valentine
|
|
| 94 |
Consider your appetite for disagreements
Adam Zerner
|
|
| 95 |
everything is okay
Tamsin Leake
|
|
| 96 |
Mysteries of mode collapse
janus
|
|
| 97 |
Slow motion videos as AI risk intuition pumps
Andrew_Critch
|
|
| 98 |
ITT-passing and civility are good; "charity" is bad; steelmanning is niche
Rob Bensinger
|
|
| 99 |
Meadow Theory
[DEACTIVATED] Duncan Sabien
|
|
| 100 |
The next decades might be wild
Marius Hobbhahn
|
|
| 101 |
Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
Jeffrey Ladish
|
|
| 102 |
Lessons learned from talking to >100 academics about AI safety
Marius Hobbhahn
|
|
| 103 |
Activated Charcoal for Hangover Prevention: Way more than you wanted to know
Maxwell Peterson
|
|
| 104 |
More Is Different for AI
jsteinhardt
|
|
| 105 |
How satisfied should you expect to be with your partner?
Vaniver
|
|
| 106 |
How my team at Lightcone sometimes gets stuff done
jacobjacob
|
|
| 107 |
The metaphor you want is "color blindness," not "blind spot."
[DEACTIVATED] Duncan Sabien
|
|
| 108 |
Logical induction for software engineers
Alex Flint
|
|
| 109 |
Call For Distillers
johnswentworth
|
|
| 110 |
Fiber arts, mysterious dodecahedrons, and waiting on “Eureka!”
eukaryote
|
|
| 111 |
A Longlist of Theories of Impact for Interpretability
Neel Nanda
|
|
| 112 |
On A List of Lethalities
Zvi
|
|
| 113 |
LOVE in a simbox is all you need
jacob_cannell
|
|
| 114 |
A transparency and interpretability tech tree
evhub
|
|
| 115 |
DeepMind alignment team opinions on AGI ruin arguments
Vika
|
|
| 116 |
Contra shard theory, in the context of the diamond maximizer problem
So8res
|
|
| 117 |
On infinite ethics
Joe Carlsmith
|
|
| 118 |
Wisdom Cannot Be Unzipped
Sable
|
|
| 119 |
Different perspectives on concept extrapolation
Stuart_Armstrong
|
|
| 120 |
Utilitarianism Meets Egalitarianism
Scott Garrabrant
|
|
| 121 |
The ignorance of normative realism bot
Joe Carlsmith
|
|
| 122 |
Shah and Yudkowsky on alignment failures
Rohin Shah
|
|
| 123 |
Nuclear Energy - Good but not the silver bullet we were hoping for
Marius Hobbhahn
|
|
| 124 |
Patient Observation
LoganStrohl
|
|
| 125 |
Monks of Magnitude
[DEACTIVATED] Duncan Sabien
|
|
| 126 |
AI coordination needs clear wins
evhub
|
|
| 127 |
Actually, All Nuclear Famine Papers are Bunk
Lao Mein
|
|
| 128 |
New Frontiers in Mojibake
Adam Scherlis
|
|
| 129 |
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
|
|
| 130 |
Introducing Pastcasting: A tool for forecasting practice
Sage Future
|
|
| 131 |
K-complexity is silly; use cross-entropy instead
So8res
|
|
| 132 |
Beware boasting about non-existent forecasting track records
Jotto999
|
|
| 133 |
Clarifying AI X-risk
zac_kenton
|
|
| 134 |
Narrative Syncing
AnnaSalamon
|
|
| 135 |
publishing alignment research and exfohazards
Tamsin Leake
|
|
| 136 |
Deontology and virtue ethics as "effective theories" of consequentialist ethics
Jan_Kulveit
|
|
| 137 |
Range and Forecasting Accuracy
niplav
|
|
| 138 |
Trends in GPU price-performance
Marius Hobbhahn
|
|
| 139 |
How To Observe Abstract Objects
LoganStrohl
|
|
| 140 |
Criticism of EA Criticism Contest
Zvi
|
|
| 141 |
Takeaways from our robust injury classifier project [Redwood Research]
dmz
|
|
| 142 |
Bad at Arithmetic, Promising at Math
cohenmacaulay
|
|
| 143 |
Don't use 'infohazard' for collectively destructive info
Eliezer Yudkowsky
|
|
| 144 |
Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection
Oliver Sourbut
|
|
| 145 |
Human values & biases are inaccessible to the genome
TurnTrout
|
|
| 146 |
I learn better when I frame learning as Vengeance for losses incurred through ignorance, and you might too
chaosmage
|
|
| 147 |
Jailbreaking ChatGPT on Release Day
Zvi
|
|
| 148 |
Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew_Critch
|
|
| 149 |
Review: Amusing Ourselves to Death
L Rudolf L
|
|
| 150 |
QNR prospects are important for AI alignment research
Eric Drexler
|
|
| 151 |
Disagreement with bio anchors that lead to shorter timelines
Marius Hobbhahn
|
|
| 152 |
Why all the fuss about recursive self-improvement?
So8res
|
|
| 153 |
LessWrong Has Agree/Disagree Voting On All New Comment Threads
Ben Pace
|
|
| 154 |
Opening Session Tips & Advice
CFAR!Duncan
|
|
| 155 |
Searching for Search
NicholasKees
|
|
| 156 |
Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Vika
|
|
| 157 |
Takeaways from a survey on AI alignment resources
DanielFilan
|
|
| 158 |
Trying to disambiguate different questions about whether RLHF is “good”
Buck
|
|
| 159 |
Benign Boundary Violations
[DEACTIVATED] Duncan Sabien
|
|
| 160 |
How To: A Workshop (or anything)
[DEACTIVATED] Duncan Sabien
|
|