This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Govind Pimpale
Posts
Sorted by New
35
Forecasting Frontier Language Model Agent Capabilities
2mo
0
54
Do models know when they are being evaluated?
2mo
3
158
Current safety training techniques do not fully transfer to the agent setting
6mo
9
46
~80 Interesting Questions about Foundation Model Agent Safety
6mo
4
69
Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Ω
9mo
Ω
0
Wikitag Contributions
Comments
Sorted by
Newest