LESSWRONG
LW

Nate Thomas
514Ω52240
Message
Dialogue
Subscribe

Redwood Research and Constellation

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Express interest in an "FHI of the West"
Nate Thomas1y132

To anyone reading this who wants to work on or discuss FHI-flavored work: Consider applying to Constellation's programs (the deadline for some of them is today!), which include salaried positions for researchers.

Reply
Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter
Nate Thomas2yΩ120

Thanks, Neel! It should be fixed now.

Reply
Takeaways from our robust injury classifier project [Redwood Research]
Nate Thomas3yΩ91812

Note that it's unsurprising that a different model categorizes this correctly because the failure was generated from an attack on the particular model we were working with. The relevant question is "given a model, how easy is it to find a failure by attacking that model using our rewriting tools?"

Reply
42Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter
Ω
2y
Ω
10
34Causal scrubbing: results on induction heads
Ω
3y
Ω
1
34Causal scrubbing: results on a paren balance checker
Ω
3y
Ω
2
18Causal scrubbing: Appendix
Ω
3y
Ω
4
206Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
Ω
3y
Ω
35
135Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
Ω
3y
Ω
14
142High-stakes alignment via adversarial training [Redwood Research report]
Ω
3y
Ω
29
56We're Redwood Research, we do applied alignment research, AMA
Ω
4y
Ω
2