LESSWRONG
LW

AlexMeinke
568100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
115Ablations for “Frontier Models are Capable of In-context Scheming”
7mo
1
210Frontier Models are Capable of In-context Scheming
Ω
7mo
Ω
24
61Training AI agents to solve hard problems could lead to Scheming
Ω
8mo
Ω
12
109Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
1y
37
93Apollo Research 1-year update
Ω
1y
Ω
0
55A starter guide for evals
Ω
2y
Ω
2
45Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Ω
2y
Ω
4