Davidmanheim — LessWrong

AIs should also refuse to work on capabilities research

This is a good question, albeit only vaguely adjacent.

My answer would be that winners curse only applies if firms aren't actively minimizing the extent to which they overbid. In the current scenario, firms are trying (moderately) hard to prevent disaster, just not reliably enough to succeed indefinitely. However, once they fail, we could easily be far past the overhang point for the AI succeeding.

Assuming to start, implausibly, that the AI itself is not strategic enough to consider its chances of succeeding, we'll assume AI capabilities nonetheless keep increasing. The firms can also detect and prevent it from trying with some probability, but their ability to monitor and stop it from trying is decreasing. The better AI firms are at stopping the models, and the slower that their ability declines relative to model capability, the more likely it is that when they do fail, the AI will succeed. And if the AIs are strategic, they will be much less likely to try if they are likely to either fail or be detected, so they ill wait even longer.

Musings on Reported Cost of Compute (Oct 2025)

Davidmanheim2d70

Data. Find out the answer.

https://www.wevolver.com/article/tpu-vs-gpu-a-comprehensive-technical-comparison

Looks like they arehwitin 2x of the H200s, albeit with some complexity in details.

Musings on Reported Cost of Compute (Oct 2025)

Davidmanheim3d20

Because it's what they can get. A factor of two or more in compute is plausibly less important than a delay of a year.

This may or may not be the case, but the argument for why it can't be very different fails.

Introducing the Epoch Capabilities Index (ECI)

Davidmanheim3d40

As I mentioned elsewhere, I'm interested in the question of how you plan to re-base the index over time.

The index excludes models from before 2023, which is understandable, since they couldn't use benchmark released after that date, which are now the critical ones. Still, it seems like a mistake, since I don't have any indication of the adaptability of the method for the future when current metrics are saturated. The obvious way to do this seems (to me) to be by including earlier benchmarks that are now saturated so that the time series can be extended backwards. And I understand that this data may be harder to collect, but as noted, it seems important to show future adaptability.

AIs should also refuse to work on capabilities research

Davidmanheim3d30

I think the space of possible futures is, in fact, almost certainly deeply weird from our current perspective. But that's been true for some time already; imagine trying to explain current political memes to someone from a couple decades ago.

Maybe Use BioLMs To Mitigate Pre-ASI Biorisk?

Davidmanheim4d20

Yes - they made a huge number of mistakes, despite having sophisticated people and tons of funding. It's been used over and over to make the claim that bioweapons are really hard - but I do wonder how much using an LLM for help would avoid all of these classes of mistake. (How much prosaic utility is there for project planning in general? Some, but at high risk if you need to worry about detection, and it's unclear that most people are willing to offload or double check their planning, despite the advantages.)

AIs should also refuse to work on capabilities research

Davidmanheim4d31

Sure, and if a machine just slightly smarter than us deployed by an AI company solves alignment instead of doing what it's been told to do, which is capabilities research, the argument will evidently have succeeded.

Maybe Use BioLMs To Mitigate Pre-ASI Biorisk?

Davidmanheim4d20

A marginal bioterrorist could probably just brew up a vat of anthrax which technically counts.

Perhaps worth nothing that they've tried in the past, and failed.

Cancer has a surprising amount of detail

Davidmanheim4d31

One the first point, there's a long way to go to get from the current narrow multimodal models for specific tasks to the type of general multimodal aggregation you seemed to suggest.

On the second point, thank you - I think you are correct that it's a mistake/poorly written, and I'm checking with the coauthor who wrote that section.

Open Thread Autumn 2025

Davidmanheim4d00

Or, phrasing it differently; "read the sequences"

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments