LESSWRONG
LW

456
Wikitags

Redwood Research

Edited by Dakara last updated 30th Dec 2024

Redwood Research is a nonprofit organization focused on mitigating risks from advanced artificial intelligence.

The initial directions of their research agenda include:

  • AI control
  • Evaluations and demonstrations of risk from strategic deception
  • Consulting on risks from misalignment
Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Redwood Research
489Alignment Faking in Large Language Models
Ω
ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck
8mo
Ω
75
278The case for ensuring that powerful AIs are controlled
Ω
ryan_greenblatt, Buck
2y
Ω
73
206Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
Ω
LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck, Nate Thomas
3y
Ω
35
239AI Control: Improving Safety Despite Intentional Subversion
Ω
Buck, Fabien Roger, ryan_greenblatt, Kshitij Sachan
2y
Ω
24
143Takeaways from our robust injury classifier project [Redwood Research]
Ω
dmz
3y
Ω
12
94Benchmarks for Detecting Measurement Tampering [Redwood Research]
Ω
ryan_greenblatt, Fabien Roger
2y
Ω
22
145Redwood Research’s current project
Ω
Buck
4y
Ω
29
135Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
Ω
maxnadeau, Xander Davies, Buck, Nate Thomas
3y
Ω
14
119Preventing Language Models from hiding their reasoning
Ω
Fabien Roger, ryan_greenblatt
2y
Ω
15
113Catching AIs red-handed
Ω
ryan_greenblatt, Buck
2y
Ω
27
48Redwood's Technique-Focused Epistemic Strategy
Ω
adamShimi
4y
Ω
1
16AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
Ω
DanielFilan
3y
Ω
0
208Will alignment-faking Claude accept a deal to reveal its misalignment?
Ω
ryan_greenblatt, Kyle Fish
7mo
Ω
28
174How will we update about scheming?
Ω
ryan_greenblatt
7mo
Ω
20
142High-stakes alignment via adversarial training [Redwood Research report]
Ω
dmz, LawrenceC, Nate Thomas
3y
Ω
29
Load More (15/52)
Add Posts