x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Considerations in diffuse control — LessWrong
Considerations in diffuse control
16
Methodological considerations in making malign initializations for control research
Alek Westover
,
Vivek Hebbar
,
Julian Stastny
5mo
0
5
Three visions for diffuse control
Alek Westover
3mo
0
29
Four Downsides of Training Policies Online
Alek Westover
,
egan
5mo
4
18
Theoretical predictions on the sample efficiency of training policies and activation monitors
Alek Westover
,
Vivek Hebbar
4mo
2
32
How will we do SFT on models with opaque reasoning?
Ω
Alek Westover
,
Vivek Hebbar
,
egan
3mo
Ω
17
40
Model organisms researchers should check whether high LRs defeat their model organisms
Dylan Xu
,
SebastianP
,
Alek Westover
,
Vivek Hebbar
,
Julian Stastny
1mo
0
61
How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?
Dylan Xu
,
Alek Westover
,
Vivek Hebbar
,
SebastianP
,
frisby
,
Julian Stastny
1mo
5