x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
fidgetsinner — LessWrong
fidgetsinner
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
35
Forecasting Frontier Language Model Agent Capabilities
10mo
0
57
Do models know when they are being evaluated?
10mo
9
162
Current safety training techniques do not fully transfer to the agent setting
1y
9
48
~80 Interesting Questions about Foundation Model Agent Safety
1y
4
69
Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Ω
1y
Ω
0
Comments