x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Alignment Jam — LessWrong
You are viewing version 1.0.0 of this page. Click here to view the latest version.
Alignment Jam
Edited by
Esben Kran
last updated
16th May 2023
You are viewing revision 1.0.0, last edited by
Esben Kran
This lists the posts that have come from the
Alignment Jam hackathons
.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
Alignment Jam
Most Relevant
2
34
Computational Mechanics Hackathon (June 1 & 2)
Adam Shai
2y
5
1
143
We Found An Neuron in GPT-2
Ω
Joseph Miller
,
Clement Neo
3y
Ω
23
1
119
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
Ω
StefanHex
,
Marius Hobbhahn
3y
Ω
1
1
81
Results from the interpretability hackathon
Esben Kran
,
Neel Nanda
3y
0
1
71
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
Ω
StefanHex
,
Marius Hobbhahn
3y
Ω
1
1
48
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
Ω
StefanHex
3y
Ω
5
1
47
Robustness of Model-Graded Evaluations and Automated Interpretability
Ω
Simon Lermen
,
viluon
3y
Ω
5
1
21
Superposition and Dropout
Edoardo Pona
3y
5
1
20
Finding Deception in Language Models
Ω
Esben Kran
,
Archana Vaidheeswaran
1y
Ω
4
1
18
Identifying semantic neurons, mechanistic circuits & interpretability web apps
Esben Kran
,
Neel Nanda
3y
0
1
13
Results from the AI testing hackathon
Esben Kran
3y
0
1
11
Towards AI Safety Infrastructure: Talk & Outline
Paul Bricman
2y
0
1
5
Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon
Esben Kran
2y
0