x

LESSWRONG
LW

Alignment Jam — LessWrong

You are viewing version 1.0.0 of this page. Click here to view the latest version.

Alignment Jam

Edited by Esben Kran last updated 16th May 2023

You are viewing revision 1.0.0, last edited by Esben Kran

This lists the posts that have come from the Alignment Jam hackathons.

Add Posts

Posts tagged Alignment Jam

2

34Computational Mechanics Hackathon (June 1 & 2)

2y

5

1

143We Found An Neuron in GPT-2

Joseph Miller, Clement Neo

3y

23

1

119Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1

StefanHex, Marius Hobbhahn

3y

1

1

81Results from the interpretability hackathon

Esben Kran, Neel Nanda

3y

0

1

71Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2

StefanHex, Marius Hobbhahn

3y

1

1

48How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!

3y

5

1

47Robustness of Model-Graded Evaluations and Automated Interpretability

Simon Lermen, viluon

3y

5

1

21Superposition and Dropout

3y

5

1

20Finding Deception in Language Models

Esben Kran, Archana Vaidheeswaran

1y

4

1

18Identifying semantic neurons, mechanistic circuits & interpretability web apps

Esben Kran, Neel Nanda

3y

0

1

13Results from the AI testing hackathon

3y

0

1

11Towards AI Safety Infrastructure: Talk & Outline

2y

0

1

5Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon

2y

0

Add Posts