This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
149
Govind Pimpale
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
35
Forecasting Frontier Language Model Agent Capabilities
7mo
0
59
Do models know when they are being evaluated?
7mo
8
158
Current safety training techniques do not fully transfer to the agent setting
10mo
9
48
~80 Interesting Questions about Foundation Model Agent Safety
11mo
4
69
Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Ω
1y
Ω
0
Comments