Joey KL's Shortform

Joey KL

Joey KL's Shortform

9th Oct 2025

1 min read

3

This is a special post for quick takes by Joey KL. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 7:41 AM

[-]Joey KL5mo10

The evaluation awareness of frontier models has recently increased dramatically, which weakens the amount of safety assurance that evals can provide. I think a good response to this would be to add an additional assurance layer based on gradual deployment/beta testing.

It's fundamentally difficult to build evals that increasingly intelligent models can't distinguish from deployment settings, but we can get much more informative results testing models in real deployment settings. This already exists via internal deployment and in some cases early access external deployment, but it doesn't look like it's being systematically incorporated as a part of safety assurance right now. Beta testing is a pretty standard practice in software development so it seems pretty tractable.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Joey KL's Shortform

3