Oscar — LessWrong

A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring

I only read the LW version not the paper, but this seems like important work to me and I'm glad you're doing it! What did you make of these two recent papers?

I have done some work on the policy side of this (whether we should/how we could enforce CoT monitorability on AI developers, or at least gain transparency into how monitorable SOTA models are). Lmk if ever it would be useful to talk about that, otherwise I will be keen to see where this line of work ends up!

Introducing the Epoch Capabilities Index (ECI)

Oscar22d63

I'd be interested in anyone's thoughts on when to use this vs e.g., METR's time horizon. The latter is of course more coding-focused than this general-purpose compilation, but that might be a feature not a bug for our purposes (predicting takeoff).

The Industrial Explosion

Oscar5mo20

AI direction could make most workers much closer in productivity to the best workers. The difference between the productivity of the average and the best manual workers is perhaps around 2-6X

Based on the derivation, it seems you mean the difference in productivity of workers doing similar tasks in the same industry, which seems important to specify. Otherwise as written, I would say the "difference between the productivity of the average and the best manual workers" is >1000x between e.g. surgeons in rich countries and e.g. farm hands/construction workers/salespeople, etc in poor countries.

But it's not clear to me the relevant multiplier is the one you pick within one country and industry. E.g. if we have abundant cheap AI cognitive labour, couldn't I set up a company producing widgets in e.g. India, employ heaps of low-skill workers for cheap but make them very productive with AI training and direction, and make a killing?

Maybe the bottleneck here is more on political economy and insitution quality, such that even with AGI not all poor countries suddenly become rich because they have productive AI-led firms.

Overall I feel a bit confused how big I think the one-time boost would be, but if we are counting across countries I would suspect >10x. Perhaps in practice the US (or whoever has the intelligence explosion) would limit access to cognitive abundance to itself and maybe a few allies.

Which AI Safety techniques will be ineffective against diffusion models?

Oscar5mo10

Great question, I don't have deep technical knowledge here, but would also be very curious about this. Intuitively, that seems right that CoT monitoring doesn't transfer over very well to this case.

Evaluating “What 2026 Looks Like” So Far

Oscar8mo50

Nice!

For the 2024 prediction "So, the most compute spent on a single training run is something like 5x10^25 FLOPs." you cite v3 as having been trained on 3.5e24 FLOP, but that is outside an OOM. Whereas Grok-2 was trained in 2024 with 3e25, so seems to be a better model to cite?

Orienting to 3 year AGI timelines

Oscar11mo2511

I will note the rationalist and EA communities ahve committed multiple ideological murders

Substantiate? I down- and disagree-voted because of this un-evidenced very grave accusation.

Should there be just one western AGI project?

Oscar1y10

I think I agree with your original statement now. It still feels slightly misleading though, as while 'keeping up with the competition' won't provide the motivation (as there putatively is no competition), there will still be strong incentives to sell at any capability level. (And as you say this may be overcome by an even stronger incentive to hoard frontier intelligence for their own R&D and strategising use. But this outweighs rather than annuls the direct economic incentive to make a packet of money by selling access to your latest system.)

Should there be just one western AGI project?

Oscar1y10

I agree the '5 projects but no selling AI services' world is moderately unlikely, the toy version of it I have in mind is something like:

It costs $10 million to set up a misuse monitoring team, API infrastructure and help manuals, a web interface, etc in up-front costs to start selling access to your AI model.
If you are the only company to do this, you make $100 million at monopoly prices.
But if multiple companies do this, the price gets driven down to marginal inference costs, and you make ~$0 in profits and just lose the initial $10 million in fixed costs.
So all the companies would prefer to be the only one selling, but second-best is for no-one to sell, and worst is for multiple companies to sell.
Even without explicit collusion, they could all realise it is not worth selling (but worth punishing anyone who defects).

This seems unlikely to me because:

Maybe the up-front costs of at least a kind of scrappy version are actually low.
Consumers lack information nd aren't fully rational, so the first company to start selling would have an advantage (OpenAI with ChatGPT in this case, even after Claude became as good or better).
Empirically, we don't tend to see an equilibrium of no company offering a service that it would be profitable for one company to offer.

So actually maybe it is sufficiently unlikely not to bother with much. There seems to be some slim theoretical world where it happens though.

Should there be just one western AGI project?

Oscar1y20

There’s no incentive for the project to sell its most advanced systems to keep up with the competition.

I found myself a bit skeptical about the economic picture laid out in this post. Currently, because there are many comparably good AI models, the price for users is driven down to near, or sometimes below (in the case of free-tier access) marginal inference costs. As such, there is somewhat less money to be made in selling access to AI services, and companies not right at the frontier, e.g. Meta, choose to make their models open weight, as probably they couldn't make much money selling access to them when people can just pay for Claude or ChatGPT instead.

However, if there is a single Western AGI project with a big lead over everyone else, they could charge far above their inference costs, given how amazingly helpful having access to the best AIs could be (and is, to some extent).

I could even imagine that if there are e.g. 5 AGI projects all similarly advanced, then maybe none of them would bother to sell their latest models, knowing that if they start charging very high prices someone else will undercut them, so it is not worth the hassle at all.

Whereas if there is one project, and if AGI/ASI turns out to be super expensive to build and USG doesn't want to foot the bill, maybe charging exorbitant monopolistic prices will be important. Relatedly, the wages of AI researchers and engineers could go down, given a monopsony in labour for the one project.

Altogether, this is one reason to think a centralised project would have higher revenue and lower costs and therefore lead to AGI faster.

(That said I am not an economist and am just guessing, maybe we should check with some econ folks.)

Centralising might make the US less likely to pause at the crucial time.

Unrelatedly, I think a contrasting dynamic here is that it is potentially a lot easier to stop a single project than to stop many projects simultaneously. In the former case, there is a smaller set of actors who need to be convinced pausing is a good idea. (Of course, even if there are many projects, if they are all heavily regulated and overseen by USG, it could still be easy for USG to pause them all even without centralisation.)

IAPS: Mapping Technical Safety Research at AI Companies

Oscar1y30

Thanks for that list of papers/posts. For most of the papers you linked, they’re not included because they did not feature in either of our search strategies: (1) titles containing specific keywords that we searched for on arXiv; (2) the paper is linked on the company’s website. I agree this is a limitation of our methodology. We won't add these papers in now as that would be somewhat ad hoc, and inconsistent between the companies.

Re the blog posts from Anthropic and what counts as a paper, I agree this is a tricky demarcation problem. We included the 'Circuit Updates' because it was linked to as a 'paper' on the Anthropic website. Even if GDM has a higher bar for what counts as a 'paper' than Anthropic, I think we don't really want to be adjudicating this, so I feel comfortable just deferring to each company about what counts as a paper for them.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments