0 |
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
|
|
1 |
MIRI announces new "Death With Dignity" strategy
Eliezer Yudkowsky
|
|
2 |
Where I agree and disagree with Eliezer
paulfchristiano
|
|
3 |
Let’s think about slowing down AI
KatjaGrace
|
|
4 |
Reward is not the optimization target
TurnTrout
|
|
5 |
Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky
|
|
6 |
It Looks Like You're Trying To Take Over The World
gwern
|
|
7 |
Staring into the abyss as a core life skill
benkuhn
|
|
8 |
You Are Not Measuring What You Think You Are Measuring
johnswentworth
|
|
9 |
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
|
|
10 |
Sazen
[DEACTIVATED] Duncan Sabien
|
|
11 |
Luck based medicine: my resentful story of becoming a medical miracle
Elizabeth
|
|
12 |
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
|
|
13 |
On how various plans miss the hard bits of the alignment challenge
So8res
|
|
14 |
Simulators
janus
|
|
15 |
Epistemic Legibility
Elizabeth
|
|
16 |
Tyranny of the Epistemic Majority
Scott Garrabrant
|
|
17 |
Counterarguments to the basic AI x-risk case
KatjaGrace
|
|
18 |
What Are You Tracking In Your Head?
johnswentworth
|
|
19 |
Safetywashing
Adam Scholl
|
|
20 |
Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor
|
|
21 |
Nonprofit Boards are Weird
HoldenKarnofsky
|
|
22 |
Optimality is the tiger, and agents are its teeth
Veedrac
|
|
23 |
chinchilla's wild implications
nostalgebraist
|
|
24 |
Losing the root for the tree
Adam Zerner
|
|
25 |
Worlds Where Iterative Design Fails
johnswentworth
|
|
26 |
Decision theory does not imply that we get to have nice things
So8res
|
|
27 |
Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"
AnnaSalamon
|
|
28 |
What an actually pessimistic containment strategy looks like
lc
|
|
29 |
Introduction to abstract entropy
Alex_Altair
|
|
30 |
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
|
|
31 |
The Redaction Machine
Ben
|
|
32 |
Butterfly Ideas
Elizabeth
|
|
33 |
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
|
|
34 |
Language models seem to be much better than humans at next-token prediction
Buck
|
|
35 |
Toni Kurz and the Insanity of Climbing Mountains
GeneSmith
|
|
36 |
Useful Vices for Wicked Problems
HoldenKarnofsky
|
|
37 |
What should you change in response to an "emergency"? And AI risk
AnnaSalamon
|
|
38 |
Models Don't "Get Reward"
Sam Ringer
|
|
39 |
How To Go From Interpretability To Alignment: Just Retarget The Search
johnswentworth
|
|
40 |
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
elspood
|
|
41 |
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
|
|
42 |
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
|
|
43 |
Humans provide an untapped wealth of evidence about alignment
TurnTrout
|
|
44 |
Learning By Writing
HoldenKarnofsky
|
|
45 |
Limerence Messes Up Your Rationality Real Bad, Yo
Raemon
|
|
46 |
The Onion Test for Personal and Institutional Honesty
chanamessinger
|
|
47 |
Counter-theses on Sleep
Natália Coelho Mendonça
|
|
48 |
The shard theory of human values
Quintin Pope
|
|
49 |
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
|
|
50 |
ProjectLawful.com: Eliezer's latest story, past 1M words
Eliezer Yudkowsky
|
|
51 |
Intro to Naturalism: Orientation
LoganStrohl
|
|
52 |
Why I think strong general AI is coming soon
porby
|
|
53 |
How might we align transformative AI if it’s developed very soon?
HoldenKarnofsky
|
|
54 |
It’s Probably Not Lithium
Natália Coelho Mendonça
|
|
55 |
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
|
|
56 |
Plans Are Predictions, Not Optimization Targets
johnswentworth
|
|
57 |
Takeoff speeds have a huge effect on what it means to work on AI x-risk
Buck
|
|
58 |
The Feeling of Idea Scarcity
johnswentworth
|
|
59 |
Six (and a half) intuitions for KL divergence
CallumMcDougall
|
|
60 |
Trigger-Action Planning
CFAR!Duncan
|
|
61 |
Have You Tried Hiring People?
rank-biserial
|
|
62 |
The Wicked Problem Experience
HoldenKarnofsky
|
|
63 |
What does it take to defend the world against out-of-control AGIs?
Steven Byrnes
|
|
64 |
On Bounded Distrust
Zvi
|
|
65 |
Setting the Zero Point
[DEACTIVATED] Duncan Sabien
|
|
66 |
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey
|
|
67 |
Limits to Legibility
Jan_Kulveit
|
|
68 |
Harms and possibilities of schooling
TsviBT
|
|
69 |
Look For Principles Which Will Carry Over To The Next Paradigm
johnswentworth
|
|
70 |
Steam
abramdemski
|
|
71 |
High Reliability Orgs, and AI Companies
Raemon
|
|
72 |
Toy Models of Superposition
evhub
|
|
73 |
Editing Advice for LessWrong Users
JustisMills
|
|
74 |
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
|
|
75 |
why assume AGIs will optimize for fixed goals?
nostalgebraist
|
|
76 |
Lies Told To Children
Eliezer Yudkowsky
|
|
77 |
Revisiting algorithmic progress
Tamay
|
|
78 |
Things that can kill you quickly: What everyone should know about first aid
jasoncrawford
|
|
79 |
Postmortem on DIY Recombinant Covid Vaccine
caffemacchiavelli
|
|
80 |
Reflections on six months of fatherhood
jasoncrawford
|
|
81 |
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
|
|
82 |
The Plan - 2022 Update
johnswentworth
|
|
83 |
12 interesting things I learned studying the discovery of nature's laws
Ben Pace
|
|
84 |
Impossibility results for unbounded utilities
paulfchristiano
|
|
85 |
Searching for outliers
benkuhn
|
|
86 |
Greyed Out Options
ozymandias
|
|
87 |
“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments
Andrew_Critch
|
|
88 |
Do bamboos set themselves on fire?
Malmesbury
|
|
89 |
Murphyjitsu: an Inner Simulator algorithm
CFAR!Duncan
|
|
90 |
Deliberate Grieving
Raemon
|
|
91 |
We Choose To Align AI
johnswentworth
|
|
92 |
The alignment problem from a deep learning perspective
Richard_Ngo
|
|
93 |
Slack matters more than any outcome
Valentine
|
|
94 |
Consider your appetite for disagreements
Adam Zerner
|
|
95 |
everything is okay
Tamsin Leake
|
|
96 |
Mysteries of mode collapse
janus
|
|
97 |
Slow motion videos as AI risk intuition pumps
Andrew_Critch
|
|
98 |
ITT-passing and civility are good; "charity" is bad; steelmanning is niche
Rob Bensinger
|
|
99 |
Meadow Theory
[DEACTIVATED] Duncan Sabien
|
|
100 |
The next decades might be wild
Marius Hobbhahn
|
|
101 |
Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
Jeffrey Ladish
|
|
102 |
Lessons learned from talking to >100 academics about AI safety
Marius Hobbhahn
|
|
103 |
Activated Charcoal for Hangover Prevention: Way more than you wanted to know
Maxwell Peterson
|
|
104 |
More Is Different for AI
jsteinhardt
|
|
105 |
How satisfied should you expect to be with your partner?
Vaniver
|
|
106 |
How my team at Lightcone sometimes gets stuff done
jacobjacob
|
|
107 |
The metaphor you want is "color blindness," not "blind spot."
[DEACTIVATED] Duncan Sabien
|
|
108 |
Logical induction for software engineers
Alex Flint
|
|
109 |
Call For Distillers
johnswentworth
|
|
110 |
Fiber arts, mysterious dodecahedrons, and waiting on “Eureka!”
eukaryote
|
|
111 |
A Longlist of Theories of Impact for Interpretability
Neel Nanda
|
|
112 |
On A List of Lethalities
Zvi
|
|
113 |
LOVE in a simbox is all you need
jacob_cannell
|
|
114 |
A transparency and interpretability tech tree
evhub
|
|
115 |
DeepMind alignment team opinions on AGI ruin arguments
Vika
|
|
116 |
Contra shard theory, in the context of the diamond maximizer problem
So8res
|
|
117 |
On infinite ethics
Joe Carlsmith
|
|
118 |
Wisdom Cannot Be Unzipped
Sable
|
|
119 |
Different perspectives on concept extrapolation
Stuart_Armstrong
|
|
120 |
Utilitarianism Meets Egalitarianism
Scott Garrabrant
|
|
121 |
The ignorance of normative realism bot
Joe Carlsmith
|
|
122 |
Shah and Yudkowsky on alignment failures
Rohin Shah
|
|
123 |
Nuclear Energy - Good but not the silver bullet we were hoping for
Marius Hobbhahn
|
|
124 |
Patient Observation
LoganStrohl
|
|
125 |
Monks of Magnitude
[DEACTIVATED] Duncan Sabien
|
|
126 |
AI coordination needs clear wins
evhub
|
|
127 |
Actually, All Nuclear Famine Papers are Bunk
Lao Mein
|
|
128 |
New Frontiers in Mojibake
Adam Scherlis
|
|
129 |
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
|
|
130 |
Introducing Pastcasting: A tool for forecasting practice
Sage Future
|
|
131 |
K-complexity is silly; use cross-entropy instead
So8res
|
|
132 |
Beware boasting about non-existent forecasting track records
Jotto999
|
|
133 |
Clarifying AI X-risk
zac_kenton
|
|
134 |
Narrative Syncing
AnnaSalamon
|
|
135 |
publishing alignment research and exfohazards
Tamsin Leake
|
|
136 |
Deontology and virtue ethics as "effective theories" of consequentialist ethics
Jan_Kulveit
|
|
137 |
Range and Forecasting Accuracy
niplav
|
|
138 |
Trends in GPU price-performance
Marius Hobbhahn
|
|
139 |
How To Observe Abstract Objects
LoganStrohl
|
|
140 |
Criticism of EA Criticism Contest
Zvi
|
|
141 |
Takeaways from our robust injury classifier project [Redwood Research]
dmz
|
|
142 |
Bad at Arithmetic, Promising at Math
cohenmacaulay
|
|
143 |
Don't use 'infohazard' for collectively destructive info
Eliezer Yudkowsky
|
|
144 |
Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection
Oliver Sourbut
|
|
145 |
Human values & biases are inaccessible to the genome
TurnTrout
|
|
146 |
I learn better when I frame learning as Vengeance for losses incurred through ignorance, and you might too
chaosmage
|
|
147 |
Jailbreaking ChatGPT on Release Day
Zvi
|
|
148 |
Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew_Critch
|
|
149 |
Review: Amusing Ourselves to Death
L Rudolf L
|
|
150 |
QNR prospects are important for AI alignment research
Eric Drexler
|
|
151 |
Disagreement with bio anchors that lead to shorter timelines
Marius Hobbhahn
|
|
152 |
Why all the fuss about recursive self-improvement?
So8res
|
|
153 |
LessWrong Has Agree/Disagree Voting On All New Comment Threads
Ben Pace
|
|
154 |
Opening Session Tips & Advice
CFAR!Duncan
|
|
155 |
Searching for Search
NicholasKees
|
|
156 |
Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Vika
|
|
157 |
Takeaways from a survey on AI alignment resources
DanielFilan
|
|
158 |
Trying to disambiguate different questions about whether RLHF is “good”
Buck
|
|
159 |
Benign Boundary Violations
[DEACTIVATED] Duncan Sabien
|
|
160 |
How To: A Workshop (or anything)
[DEACTIVATED] Duncan Sabien
|
|