LESSWRONG
LW

149
Govind Pimpale
309200
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
35Forecasting Frontier Language Model Agent Capabilities
7mo
0
59Do models know when they are being evaluated?
7mo
8
158Current safety training techniques do not fully transfer to the agent setting
10mo
9
48~80 Interesting Questions about Foundation Model Agent Safety
11mo
4
69Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Ω
1y
Ω
0