LESSWRONG
LW

1378
Govind Pimpale
310300
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
35Forecasting Frontier Language Model Agent Capabilities
8mo
0
57Do models know when they are being evaluated?
8mo
9
160Current safety training techniques do not fully transfer to the agent setting
1y
9
48~80 Interesting Questions about Foundation Model Agent Safety
1y
4
69Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Ω
1y
Ω
0