LESSWRONG
is fundraising!
Tags
LW
$

Alignment Jam

EditHistorySubscribe
Discussion (1)
Help improve this page
EditHistorySubscribe
Discussion (1)
Help improve this page
Alignment Jam
Random Tag
Contributors
1Esben Kran

This lists the posts that have come from the Alignment Jam hackathons.

Posts tagged Alignment Jam
2
34Computational Mechanics Hackathon (June 1 & 2)
Adam Shai
8mo
5
1
143We Found An Neuron in GPT-2
Ω
Joseph Miller, Clement Neo
2y
Ω
23
1
119Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
Ω
StefanHex, Marius Hobbhahn
2y
Ω
1
1
81Results from the interpretability hackathon
Esben Kran, Neel Nanda
2y
0
1
71Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
Ω
StefanHex, Marius Hobbhahn
2y
Ω
1
1
47How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
Ω
StefanHex
2y
Ω
5
1
47Robustness of Model-Graded Evaluations and Automated Interpretability
Ω
Simon Lermen, viluon
1y
Ω
5
1
21Superposition and Dropout
Edoardo Pona
2y
5
1
18Identifying semantic neurons, mechanistic circuits & interpretability web apps
Esben Kran, Neel Nanda
2y
0
1
18Finding Deception in Language Models
Ω
Esben Kran, Archana Vaidheeswaran
5mo
Ω
4
1
13Results from the AI testing hackathon
Esben Kran
2y
0
1
11Towards AI Safety Infrastructure: Talk & Outline
Paul Bricman
1y
0
1
5Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon
Esben Kran
9mo
0
Add Posts