LESSWRONG
LW

perepelart
1040
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
AI companies' eval reports mostly don't support their claims
perepelart5d10

Thank you! I had taken a look at it back when it was published, but now I have noticed that my "thanks" emote on your message did not work.

Reply
AI companies' eval reports mostly don't support their claims
perepelart3mo30

Thank you, I understand your point now. In the post, you were referring specifically to a particular part of alignment, namely capability evaluations, while my example relates to more loosely scoped problems.  Since I am still new to this terminology, the distinction was not immediately clear to me.

Reply
AI companies' eval reports mostly don't support their claims
perepelart3mo10

@Zach Stein-Perlman, if you, as the author of the post saying that I am not getting the point, then it can be the case. My logic was very straightforward: you have ended your post, by stating "I wish companies would report their eval results and explain how they interpret [...] I also wish for better elicitation and some hard accountability, but that's more costly."

Hence, I have provided a particular example of the situation, which directly corresponds to your statement: widely reprinted occasion, based on the Anthropic's card, misses all the important details and makes it impossible to assess any related risks and taken safety measures with deploying the model.

Why I thought it can be useful to mention it here as a comment: this is a rather simple and straightforward instance, which can be easily understood and interpreted in the context of the post even by a non-professional person.

Reply
AI companies' eval reports mostly don't support their claims
perepelart3mo0-2

Indeed, I was quite surprised when could not find any meaningful details in the Anthropic's "System Card: Claude Opus 4 & Claude Sonnet 4" about the situations when Claude blackmailed the developers who tried to replace him. Based on the amount of hype these cases produced, I was sure there should be at least a thorough blog post about it (if not a paper), but all I found was several lines of information in their card.

Reply21
No posts to display.