This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Alignment Jam
Edit
History
Subscribe
Discussion
(1)
Help improve this page
Edit
History
Subscribe
Discussion
(1)
Help improve this page
Alignment Jam
Random Tag
Contributors
1
Esben Kran
This lists the posts that have come from the
Alignment Jam hackathons
.
Posts tagged
Alignment Jam
Most Relevant
2
34
Computational Mechanics Hackathon (June 1 & 2)
Adam Shai
7mo
5
1
143
We Found An Neuron in GPT-2
Ω
Joseph Miller
,
Clement Neo
2y
Ω
23
1
119
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
Ω
StefanHex
,
Marius Hobbhahn
2y
Ω
1
1
81
Results from the interpretability hackathon
Esben Kran
,
Neel Nanda
2y
0
1
71
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
Ω
StefanHex
,
Marius Hobbhahn
2y
Ω
1
1
47
Robustness of Model-Graded Evaluations and Automated Interpretability
Ω
Simon Lermen
,
viluon
1y
Ω
5
1
47
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
Ω
StefanHex
2y
Ω
5
1
21
Superposition and Dropout
Edoardo Pona
2y
5
1
18
Finding Deception in Language Models
Ω
Esben Kran
,
Archana Vaidheeswaran
4mo
Ω
4
1
18
Identifying semantic neurons, mechanistic circuits & interpretability web apps
Esben Kran
,
Neel Nanda
2y
0
1
13
Results from the AI testing hackathon
Esben Kran
2y
0
1
11
Towards AI Safety Infrastructure: Talk & Outline
Paul Bricman
1y
0
1
5
Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon
Esben Kran
8mo
0