LESSWRONG
LW

64
AlexMeinke
662100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
124Stress Testing Deliberative Alignment for Anti-Scheming Training
Ω
1mo
Ω
15
115Ablations for “Frontier Models are Capable of In-context Scheming”
10mo
1
210Frontier Models are Capable of In-context Scheming
Ω
10mo
Ω
24
68Training AI agents to solve hard problems could lead to Scheming
Ω
11mo
Ω
12
109Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
1y
37
93Apollo Research 1-year update
Ω
1y
Ω
0
57A starter guide for evals
Ω
2y
Ω
2
45Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Ω
2y
Ω
4