I just published my first paper as an article on less wrong, on the topic of evaluation awareness.
TL;DR: We found "evaluation awareness" increases as the model size increases. Evaluation awareness is the ability of the model to differentiate if it is being evaluated or is deployed.
This will have serious concern if not countered. Your feedback would be highly appreciated on the article.
I just published my first paper as an article on less wrong, on the topic of evaluation awareness.
TL;DR: We found "evaluation awareness" increases as the model size increases. Evaluation awareness is the ability of the model to differentiate if it is being evaluated or is deployed.
This will have serious concern if not countered. Your feedback would be highly appreciated on the article.