Redwood Research

Edited by Dakara last updated 30th Dec 2024

Redwood Research is a nonprofit organization focused on mitigating risks from advanced artificial intelligence.

The initial directions of their research agenda include:

AI control
Evaluations and demonstrations of risk from strategic deception
Consulting on risks from misalignment

Posts tagged Redwood Research

20

493Alignment Faking in Large Language Models

ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck

2y

Ω

87

19

293The case for ensuring that powerful AIs are controlled

Ω

ryan_greenblatt, Buck

3y

Ω

74

12

48Redwood's Technique-Focused Epistemic Strategy

Ω

adamShimi

5y

Ω

1

11

177How will we update about scheming?

Ω

ryan_greenblatt

1y

Ω

21

6

208Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

Ω

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck, Nate Thomas

4y

Ω

35

4

241AI Control: Improving Safety Despite Intentional Subversion

Ω

Buck, Fabien Roger, ryan_greenblatt, Kshitij Sachan

3y

Ω

26

4

143Takeaways from our robust injury classifier project [Redwood Research]

Ω

dmz

4y

Ω

12

4

94Benchmarks for Detecting Measurement Tampering [Redwood Research]

Ω

ryan_greenblatt, Fabien Roger

3y

Ω

22

3

145Redwood Research’s current project

Ω

Buck

5y

Ω

29

3

135Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

Ω

maxnadeau, Xander Davies, Buck, Nate Thomas

4y

Ω

14

3

121Preventing Language Models from hiding their reasoning

Ω

Fabien Roger, ryan_greenblatt

3y

Ω

15

3

116Catching AIs red-handed

Ω

ryan_greenblatt, Buck

3y

Ω

28

3

16AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

Ω

DanielFilan

4y

Ω

0

2

209Will alignment-faking Claude accept a deal to reveal its misalignment?

Ω

ryan_greenblatt, Kyle Fish

1y

Ω

28

2

207The behavioral selection model for predicting AI motivations

Ω

Alex Mallen, Buck

8mo

Ω

31