If you don't feel great about the numbers, why are there so many of them on the website? The presentation seems much more focused on the scores than a collection of information.
I think the numbers are much better than nothing and much better than any substitute that currently exists, and I'm not aware of a better design or a great way to deemphasize them while preserving their value.
Edit: like, they convey a lot of real info, and more conservative alternatives would fail to do so.
Off-topic: thanks for commenting in the same thread so I can see your names side-by-side. Until now, I thought you were the same person.
Now that I know Zach does not work at Anthropic, it suddenly makes more sense that he runs a website comparing AI labs and crossposts model announcements from various companies to LW
A quick take from me (I did some of the design on the site, though Ray did most of it): I think the numbers are helpful for organizing the content into meaningful categories, and helps people figure out where the interesting content is. Otherwise you would be dealing with a huge amount of prose. I currently think the numbers/table is a pretty decent way to get a sense of what the content is, and where it makes sense to pay attention to (usually the places where there is the most variance in numbers across a category, or where scores are particularly low or high).
Since a lot of people disagree with this, please tell me what a score of 100% mean or say 50% or 37%? I am not writing this to provoke, I am genuinely interested to know.
The new scorecard is on my website, AI Lab Watch. This replaces my old scorecard. I redid the content from scratch; it's now up-to-date and higher-quality. I'm also happy with the scorecard's structure: you can click on rows, columns, and cells and zoom in to various things. Check it out! Thanks to Lightcone for designing the site.
While it is a scorecard, I don't feel great about the numbers; I mostly see it as a collection of information.