This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Home
All Posts
Concepts
Library
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
AI Safety Camp final presentations
Sat Apr 27
•
Online
Virtual AI Safety Unconference 2024
Thu May 23
•
Online
Hyderabad – ACX Meetups Everywhere Spring 2024
Sat Apr 27
•
Secunderabad
ACX Schelling Meetup
Sat Apr 27
•
Karlsruhe
Subscribe (RSS/Email)
LW the Album
About
FAQ
All Posts
Sorted by Magic (New & Upvoted)
Timeframe:
All time
Daily
Weekly
Monthly
Yearly
Exponential
Sorted by:
Magic (New & Upvoted)
Top
Top (Inflation Adjusted)
Recent Comments
New
Old
Filtered by:
All Posts
Frontpage
Curated
Questions
Events
Show Low Karma
Show Events
256
Thoughts on seed oil
dynomight
5d
79
105
My experience using financial commitments to overcome akrasia
William Howard
2d
28
302
The Best Tacit Knowledge Videos on Every Subject
Parker Conley
,
hans truman
12d
123
257
On green
Joe Carlsmith
1mo
33
248
My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman
23d
14
200
"How could I have thought that faster?"
mesaoptimizer
1mo
30
229
My Clients, The Liars
ymeskhout
2mo
85
171
Toward a Broader Conception of Adverse Selection
Ricki Heicklen
17d
61
262
Scale Was All We Needed, At First
Gabriel Mukobi
1mo
31
77
[Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor
8d
22
349
There is way too much serendipity
Malmesbury
3mo
56
140
Using axis lines for good or evil
dynomight
1mo
39
213
CFAR Takeaways: Andrew Critch
Raemon
2mo
62
208
Believing In
AnnaSalamon
3mo
49
404
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
GeneSmith
,
kman
4mo
162
291
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Ω
evhub
,
Carson Denison
,
Meg
,
Monte M
,
David Duvenaud
,
Nicholas Schiefer
,
Ethan Perez
3mo
Ω
94
243
The case for ensuring that powerful AIs are controlled
Ω
ryan_greenblatt
,
Buck
3mo
Ω
66
109
Social status part 1/2: negotiations over object-level preferences
Steven Byrnes
1mo
15
265
Gentleness and the artificial Other
Joe Carlsmith
4mo
33
139
And All the Shoggoths Merely Players
Zack_M_Davis
2mo
56
259
Constellations are Younger than Continents
Jeffrey Heninger
4mo
22
307
Shallow review of live agendas in alignment & safety
Ω
technicalities
,
Stag
5mo
Ω
69
288
Speaking to Congressional staffers about AI risk
Akash
,
hath
2mo
23
481
The Talk: a brief explanation of sexual dimorphism
Malmesbury
7mo
72
159
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Ω
Jeremy Gillen
,
peterbarnett
3mo
Ω
60
124
Updatelessness doesn't solve most problems
Ω
Martín Soto
2mo
Ω
43
282
Social Dark Matter
[DEACTIVATED] Duncan Sabien
5mo
112
106
Attitudes about Applied Rationality
Camille Berger
3mo
18
255
AI Timelines
Ω
habryka
,
Daniel Kokotajlo
,
Ajeya Cotra
,
Ege Erdil
6mo
Ω
74
261
The 6D effect: When companies take risks, one email can be very powerful.
scasper
5mo
40
122
A Shutdown Problem Proposal
Ω
johnswentworth
,
David Lorell
3mo
Ω
61
215
What are the results of more parental supervision and less outdoor play?
juliawise
5mo
30
325
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth
7mo
53
57
Acting Wholesomely
owencb
1mo
64
286
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Ω
Zac Hatfield-Dodds
6mo
Ω
21
130
Deep atheism and AI risk
Joe Carlsmith
2mo
22
240
Book Review: Going Infinite
Zvi
6mo
109
238
Alignment Implications of LLM Successes: a Debate in One Act
Ω
Zack_M_Davis
6mo
Ω
50
147
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Ω
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
,
Rohin Shah
4mo
Ω
21
663
SolidGoldMagikarp (plus, prompt generation)
Ω
Jessica Rumbelow
,
mwatkins
1y
Ω
204
131
The Dark Arts
lsusr
,
Lyrongolem
4mo
49
416
The ants and the grasshopper
Richard_Ngo
1y
35
155
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
,
habryka
3mo
53
459
How much do you believe your results?
Eric Neyman
1y
14
185
Thinking By The Clock
Screwtape
5mo
27
306
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
Ω
evhub
,
Nicholas Schiefer
,
Carson Denison
,
Ethan Perez
8mo
Ω
26
418
Steering GPT-2-XL by adding an activation vector
Ω
TurnTrout
,
Monte M
,
David Udell
,
lisathiergart
,
Ulisse Mini
1y
Ω
97
897
AGI Ruin: A List of Lethalities
Ω
Eliezer Yudkowsky
2y
Ω
690
874
Where I agree and disagree with Eliezer
Ω
paulfchristiano
2y
Ω
219
250
Dear Self; we need to talk about ambition
Elizabeth
8mo
25