Data. Find out the answer.
https://www.wevolver.com/article/tpu-vs-gpu-a-comprehensive-technical-comparison
Looks like they arehwitin 2x of the H200s, albeit with some complexity in details.
Because it's what they can get. A factor of two or more in compute is plausibly less important than a delay of a year.
This may or may not be the case, but the argument for why it can't be very different fails.
As I mentioned elsewhere, I'm interested in the question of how you plan to re-base the index over time.
The index excludes models from before 2023, which is understandable, since they couldn't use benchmark released after that date, which are now the critical ones. Still, it seems like a mistake, since I don't have any indication of the adaptability of the method for the future when current metrics are saturated. The obvious way to do this seems (to me) to be by including earlier benchmarks that are now saturated so that the time series can be extended backwards. And I understand that this data may be harder to collect, but as noted, it seems important to show future adaptability.
I think the space of possible futures is, in fact, almost certainly deeply weird from our current perspective. But that's been true for some time already; imagine trying to explain current political memes to someone from a couple decades ago.
Yes - they made a huge number of mistakes, despite having sophisticated people and tons of funding. It's been used over and over to make the claim that bioweapons are really hard - but I do wonder how much using an LLM for help would avoid all of these classes of mistake. (How much prosaic utility is there for project planning in general? Some, but at high risk if you need to worry about detection, and it's unclear that most people are willing to offload or double check their planning, despite the advantages.)
Sure, and if a machine just slightly smarter than us deployed by an AI company solves alignment instead of doing what it's been told to do, which is capabilities research, the argument will evidently have succeeded.
A marginal bioterrorist could probably just brew up a vat of anthrax which technically counts.
Perhaps worth nothing that they've tried in the past, and failed.
One the first point, there's a long way to go to get from the current narrow multimodal models for specific tasks to the type of general multimodal aggregation you seemed to suggest.
On the second point, thank you - I think you are correct that it's a mistake/poorly written, and I'm checking with the coauthor who wrote that section.
Or, phrasing it differently; "read the sequences"
This is a good question, albeit only vaguely adjacent.
My answer would be that winners curse only applies if firms aren't actively minimizing the extent to which they overbid. In the current scenario, firms are trying (moderately) hard to prevent disaster, just not reliably enough to succeed indefinitely. However, once they fail, we could easily be far past the overhang point for the AI succeeding.
Assuming to start, implausibly, that the AI itself is not strategic enough to consider its chances of succeeding, we'll assume AI capabilities nonetheless keep increasing. The firms can also detect and prevent it from trying with some probability, but their ability to monitor and stop it from trying is decreasing. The better AI firms are at stopping the models, and the slower that their ability declines relative to model capability, the more likely it is that when they do fail, the AI will succeed. And if the AIs are strategic, they will be much less likely to try if they are likely to either fail or be detected, so they ill wait even longer.