x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
AlexMeinke — LessWrong
AlexMeinke
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
126
Stress Testing Deliberative Alignment for Anti-Scheming Training
Ω
3mo
Ω
19
115
Ablations for “Frontier Models are Capable of In-context Scheming”
1y
1
210
Frontier Models are Capable of In-context Scheming
Ω
1y
Ω
24
72
Training AI agents to solve hard problems could lead to Scheming
Ω
1y
Ω
12
109
Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
1y
39
93
Apollo Research 1-year update
Ω
2y
Ω
0
58
A starter guide for evals
Ω
2y
Ω
2
45
Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Ω
2y
Ω
4
Comments