494

LESSWRONG
LW

493

Joey KL's Shortform

by Joey KL
9th Oct 2025
1 min read
1

3

This is a special post for quick takes by Joey KL. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Joey KL's Shortform
1Joey KL
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 12:12 AM
[-]Joey KL12d10

The evaluation awareness of frontier models has recently increased dramatically, which weakens the amount of safety assurance that evals can provide. I think a good response to this would be to add an additional assurance layer based on gradual deployment/beta testing.  

It's fundamentally difficult to build evals that increasingly intelligent models can't distinguish from deployment settings, but we can get much more informative results testing models in real deployment settings. This already exists via internal deployment and in some cases early access external deployment, but it doesn't look like it's being systematically incorporated as a part of safety assurance right now. Beta testing is a pretty standard practice in software development so it seems pretty tractable. 

Reply
Moderation Log
More from Joey KL
View more
Curated and popular this week
1Comments