x

LESSWRONG

LW

Alignment Jam — LessWrong

Alignment Jam

Edited by Esben Kran last updated 16th May 2023

This lists the posts that have come from the Alignment Jam hackathons.

Add Posts

Posts tagged Alignment Jam

2

34Computational Mechanics Hackathon (June 1 & 2)

2y

5

1

143We Found An Neuron in GPT-2

Joseph Miller, Clement Neo

3y

23

1

119Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1

StefanHex, Marius Hobbhahn

3y

1

1

81Results from the interpretability hackathon

Esben Kran, Neel Nanda

4y

0

1

71Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2

StefanHex, Marius Hobbhahn

3y

1

1

48How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!

3y

5

1

47Robustness of Model-Graded Evaluations and Automated Interpretability

Simon Lermen, viluon

3y

5

1

21Superposition and Dropout

3y

5

1

20Finding Deception in Language Models

Esben Kran, Archana Vaidheeswaran

2y

4

1

18Identifying semantic neurons, mechanistic circuits & interpretability web apps

Esben Kran, Neel Nanda

3y

0

1

13Results from the AI testing hackathon

4y

0

1

11Towards AI Safety Infrastructure: Talk & Outline

3y

0

1

5Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon

2y

0

Add Posts