This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Alignment Jam
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Alignment Jam
Random Tag
Contributors
1
Esben Kran
This lists the posts that have come from the
Alignment Jam hackathons
.
Posts tagged
Alignment Jam
Most Relevant
1
141
We Found An Neuron in GPT-2
Ω
Joseph Miller
,
Clement Neo
1y
Ω
22
1
119
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
Ω
StefanHex
,
Marius Hobbhahn
1y
Ω
1
1
81
Results from the interpretability hackathon
Esben Kran
,
Neel Nanda
1y
0
1
71
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
Ω
StefanHex
,
Marius Hobbhahn
1y
Ω
1
1
47
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
Ω
StefanHex
1y
Ω
5
1
44
Robustness of Model-Graded Evaluations and Automated Interpretability
Ω
Simon Lermen
,
viluon
9mo
Ω
5
1
21
Superposition and Dropout
Edoardo Pona
1y
5
1
18
Identifying semantic neurons, mechanistic circuits & interpretability web apps
Esben Kran
,
Neel Nanda
1y
0
1
13
Results from the AI testing hackathon
Esben Kran
1y
0
1
10
Towards AI Safety Infrastructure: Talk & Outline
Paul Bricman
4mo
0
1
5
Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon
Esben Kran
7d
0