LESSWRONG
LW

AI
Frontpage

72

New scorecard evaluating AI companies on safety

by Zach Stein-Perlman
26th May 2025
1 min read
8

72

AI
Frontpage

72

New scorecard evaluating AI companies on safety
12Zac Hatfield-Dodds
8Zach Stein-Perlman
8Caleb Biddulph
7habryka
1Matt Bamberger
2Raemon
1Anders Lindström
2Anders Lindström
New Comment
8 comments, sorted by
top scoring
Click to highlight new comments since: Today at 8:45 AM
[-]Zac Hatfield-Dodds3mo124

If you don't feel great about the numbers, why are there so many of them on the website? The presentation seems much more focused on the scores than a collection of information.

Reply
[-]Zach Stein-Perlman3mo*84

I think the numbers are much better than nothing and much better than any substitute that currently exists, and I'm not aware of a better design or a great way to deemphasize them while preserving their value.

Edit: like, they convey a lot of real info, and more conservative alternatives would fail to do so.

Reply
[-]Caleb Biddulph3mo82

Off-topic: thanks for commenting in the same thread so I can see your names side-by-side. Until now, I thought you were the same person.

Now that I know Zach does not work at Anthropic, it suddenly makes more sense that he runs a website comparing AI labs and crossposts model announcements from various companies to LW

Reply7
[-]habryka3mo72

A quick take from me (I did some of the design on the site, though Ray did most of it): I think the numbers are helpful for organizing the content into meaningful categories, and helps people figure out where the interesting content is. Otherwise you would be dealing with a huge amount of prose. I currently think the numbers/table is a pretty decent way to get a sense of what the content is, and where it makes sense to pay attention to (usually the places where there is the most variance in numbers across a category, or where scores are particularly low or high).

Reply
[-]Matt Bamberger3mo10

I definitely find the presentation useful. In particular, the ability to drill down on each block is great (though it took me a moment to figure out how that worked).

Reply
[-]Raemon3mo20

If you have any thoughts on what would be more intuitive while accomplishing the goal, let me know.

Reply
[-]Anders Lindström3mo1-14
  • Feedback: I really like the breakdown of each companies stance on safety, but please skíp the percentage numbers. Its just silly to give a score based on guesswork.
Reply
[-]Anders Lindström3mo20

Since a lot of people disagree with this, please tell me what a score of 100% mean or say 50% or 37%? I am not writing this to provoke, I am genuinely interested to know. 

Reply
Moderation Log
More from Zach Stein-Perlman
View more
Curated and popular this week
8Comments

The new scorecard is on my website, AI Lab Watch. This replaces my old scorecard. I redid the content from scratch; it's now up-to-date and higher-quality. I'm also happy with the scorecard's structure: you can click on rows, columns, and cells and zoom in to various things. Check it out! Thanks to Lightcone for designing the site.

While it is a scorecard, I don't feel great about the numbers; I mostly see it as a collection of information.