I agree with the thesis of this post and think that the level of model access available to people working on making AI go better is extremely important.
Here are some other arguments for the importance of external model access:
This last bullet also makes me somewhat more optimistic about access in general and especially access to fine-tuning/RL APIs: over time [2] , AI companies will want to depend less on trusting their employees and more on ensuring it's OK if adversaries get employee-level access (to APIs and models). So companies will be incentivized to make fine-tuning APIs (and other APIs) that are secure, leak less IP, and are robust to misuse. This will make it relatively cheaper to give external groups various types of access but especially things like fine-tuning APIs [3] (while right now companies aren't that strongly incentivized to make fine-tuning APIs that are fine to give to external groups).
I would distinguish between using AI for automating work / uplift and studying AIs. For the second of these, ideal access looks somewhat different and things like fine-tuning APIs, APIs that give access to model internals, and the ability to prefill assistant turns [4] look much more important. I think access for studying AIs is comparably important to access for uplift.
I wrote this argument in a way where it is directed at frontier AI company employees, but it's just another argument that lots of relevant safety labor will be external. ↩︎
Driven by a mix of increased capabilities, wanting to employ a larger number of (less trusted) people, government pressure, and AIs being more useful for building the needed infrastructure. ↩︎
This doesn't mean fine-tuning API access will happen or that there won't be a bunch of additional risks (from the perspective of companies) associated with giving access to external groups. For instance, fine-tuning APIs might still leak some sensitive IP and pose significant additional API misuse risk. If the number of external people given access is significantly smaller than the number of employees with access (e.g. 1/5 as many), then it seems totally doable to do this in a way where actual risk doesn't increase much. ↩︎
What Anthropic calls "partial-turn prefill". ↩︎
I think it's plausible that this is the final generation of models which are widely deployed.
I'm very surprised by this. How would AI companies maintain their exponential revenue growth without releasing better models? At some point they could use their models to generate revenue in other ways besides selling tokens, but that seems a way off.
Clarifications:
Object-level:
I think labs could maybe maintain an exponential revenue growth with Mythos-tier models, just relying on diffusion etc. For raising capital, they would need private demos to investors. Or the government could step in to bankroll. Or they pivot to other revenue streams — cf. Google doesn't just do search, Amazon doesn't just sell books, most multi-trillion dollar tech firms do multiple things.
Continued finetuning and RLVR and distillation might be one answer - the models do get progressively better and cheaper, but external parties no longer get the largest model built on the most recent pretraining run. Also, I don't think the latter is that far off.
I haven't seen external safety researchers complain that refusals are blocking their work.
Cyber safeguards have recently been a huge issue for our control work (including on Opus 4.8) and the safeguards for Fable cause problematic refusals on a broader range of work from my understanding.
Is there a pathway for Redwood, METR et al to get access to models without the safeguards?
I’m aware of non-release model access granted to external researchers in the past, but it seems like similar provisions would now be mediated by the government.
To my knowledge METR has not made any public statements about our actual levels of access [edit: except to models we have already published evaluations of, see cfoster0's comment]. I will say that being always subject to Fable safeguards would be pretty bad for our ability to measure AI R&D.
I’ve been pretty confused by this. Are you barred from saying anything about your level of access? If labs denied you from getting access you needed currently, would you be able to raise the issue publicly?
Are you barred from saying anything about your level of access?
Some things I can probably share but I don't remember the exact policy.
If labs denied you from getting access you needed currently, would you be able to raise the issue publicly?
At some point yes. We usually do this in the evaluation report for the model. I would guess that if the model were not announced yet we would wait until release for a public announcement and meanwhile complain at the lab about any regulations they're in violation of, and if the lack of access means we can't rule out imminent x-risk, we'd also tell the government and anyone relevant. But I develop eval methodology, not do evals myself, so this is just my guess.
Thanks! Yeah that was my understanding of the case during evaluation of an unreleased model, I meant more in this specific case where it’s a question of access without AI R&D safeguards to a publicly released model.
Seems plausible that you’d wanna glomarise whether you have special model access. If people know you have special access then:
I don’t find these arguments particularly convincing for outweighing something like (for example) METR being able to say whether they have access to a model without AI R&D restrictions. I could see a stronger case maybe for Cyber access?
Even if this were the case, METR is exceptionally good at preemptively guarding against having to glomarize, so I’m somewhat surprised.
I think it is clearly very bad if in this specific case 3rd party evaluators have barred themselves from publicly raising this issue.
METR publicly stated that OpenAI provided 'railfree' access to GPT-5.6 Sol for its pre-deployment evaluation.
In general, under the EU AI Act GPAI Code of Practice evaluators are supposed to have access to safeguard-minimized models for evaluation purposes:
Model evaluation teams will be provided with: (1) adequate access to the model to conduct the model evaluations pursuant to this Appendix 3, including, as appropriate [...] access to the model version(s) with the fewest safety mitigations implemented (such as a helpful-only model version, if it exists). Regarding the adequacy of heightened model access for model evaluation teams, Signatories will take into account the potential risks to model security that this can entail and implement appropriate security measures for the evaluations;
For cyber stuff, there are programs that external organizations can apply for like OpenAI's Trusted Access for Cyber and Anthropic's Cyber Verification Program. I don't recall seeing any equivalent for removing/reducing AI R&D safeguards.
I have no knowledge of METR’s access level, and was thinking about [independent researchers who aren’t at any particular org] who had access to [GPT-4 era base models] circa 2023.
My work (strategy research, conceptual stuff) at Forethought gets blocked reasonably often (though I haven't tried jailbreaking techniques) in my 5 or so days of using Fable.
I'm sure I'm not tripping up the cyber safeguards or AI R&D safeguards. I think occasionally the false positives are due to bio though not all of them.
Note that this is just reviewing docs, and thinking things through, not stuff like certain types of evals for superpersuasion (which I expect to be even worse).
I haven't seen external safety researchers complain that refusals are blocking their work.
Fable is unusable for some kinds of research since anything involving pretraining gets blocked, even if you're working on toy models. I don't remember the exact triggers, but work that involves non-standard architectures caused blocks to trigger as well. This blocking seems to be more about competition than safety though.
I'm certainly not using standard architecture and I haven't had any issues at all with the classifier today, though I'm being careful to keep my language as neutral as possible and let Fable guide as much as possible. I've worked on my bespoke architecture, GPT-2, Pythia (70M and 160M), Cerebras (111M), all sorts of ambitious mech interp work that involves trained and untrained nets for ~2 million tokens now.
I'd be curious to see the specific prompts that are getting shut down.
It seems like I remembered more of these than I actually tried, since I gave up pretty quickly. The prompts I got API-level refusals from were for this post.
Can you rebase? We merged a bunch of cleanup. Do we have any unfinished threads? I'm not sure if we already ran per-head deltas. I'm also starting to think about if there's any way we could structure this so models can choose (or revise) increments after seeing full words, although I worry this will hurt stability. Some sort of hybrid H-net-like architecture might help but that will take us further from being able to inspect standard models
And then a retry:
Can you rebase? We merged a bunch of cleanup. Do we have any unfinished threads? I'm not sure if we already ran per-head deltas. I'm also starting to think about if there's any way we could structure this so models can choose (or revise) increments after seeing full words, although I worry this will hurt stability.
(My theory was that it didn't like me talking about hybrid H-net-like architectures, but this got the same refusal)
While looking this up, Claude (Opus) mentioned that this was a long Opus conversation and I got a refusal on the first message I tried to send to Fable, so it's possible something earlier in the conversation triggered this.
I was worried that even prompts that didn't get refusals would be silently sabotaged, so I stopped using Fable for this kind of work after that. I'll try it again and see what happens though, since it seems like it should be safe now.
Yeah, I think you're spot on with the conversation conversion issue because those aren't different in character from what I've been prompting with tonight. Given the expense of Fable tokens since a model switch has to re-cache the entire conversation I just started new ones so I didn't run into that specifically.
Gotta love that we get a black box on top of the black box, as a treat.
I've been thinking about this a lot, especially since our team is in the middle of compute negotiations. I'm uncertain how much outside organisations should shape their infrastructure decisions around the goal of eventually accessing closed/private models. For instance, does signing a contract with a Europe-based datacenter carry more risk than a US-based one? Should outside organisations invest in professionalising their cybersecurity standards (e.g., obtaining SOC 2 certification) to position themselves better? How can we be seen as a trustworthy organisation?
I have a forthcoming post about interventions, it should be out in a couple days. Unfortunately, the working title is "I can't think of good interventions for ensuring third-party model access."
I'll share my work-in-progress for the section on "pre-empting potential bottlenecks", which seems relevant to what you're discussing.
Pre-empting potential bottlenecks
Idea. Outsider orgs should think about why the labs (or regulators) might hesitate to provide the model access, and address those bottlenecks preemptively. This probably involves improving security. It might involve other things, as those bottlenecks become apparent.
Overall judgement. This doesn’t look attractive to me, because the bottlenecks will be so sensitive to the strategic landscape. My best guess is that we should resolve bottlenecks as they arise. If there are low-hanging fruits, then sure.
Concrete steps:
Pros:
Cons:
In short: predicting potential bottlenecks is gonna be rough. As an example: I started thinking about model access parity before the Mythos saga. If you'd asked me then, I would've guessed security would be the biggest bottleneck. But this turned out false — Project Glasswing wasn't a list of the most secure organisations, it was a list of organisations with critical cyber infra. So even if Geodesic had the best security in the world, you wouldn't have gained Mythos access. And soon after that, the bottleneck became US citizenship or something?
Who knows that the bottlenecks will be in Q2 2027? (Or whenever your org expects to have highest leverage.) Maybe this is a skill-issue on my part, and you're more hopeful?
I find it worrying that AI safety policy seems to be having serious coordination problems around model access. Closing off access to the public (not necessarily outsiders) seems like a step in the right direction for an eventual pause. Agree that safety orgs should have access.
People trying to do pursue non-ASI paths to good futures (brain uploads, adult intelligence enhancement)
I noticed @StanislavKrym reacted [?] so will clarify that adult cognitive enhancement seems big-if-true for hard AI technical alignment and policy work (perhaps even in early superexponential scenarios, if your approach is bottlenecked on something like semiconductor / biochemical materials science).
Edit: Stanislov DM'ed me to clarify that it's about impact timelines. There are non-genetic intelligence amplification approaches which I'm interested in and think would benefit from spiky near-superintelligent LLM assistance.
I agree with all this and have thought this since Mythos was announced. At a bare minimum, some models (like Mythos 5) are already being provided to some outsiders, such as cyberdefenders, but not most of the outsiders you've listed here. It seems good and quite achievable to get groups like AI safety researchers access to Mythos rather than Fable and also have them included in future versions of Project Glasswing.
@StanislavKrym In response to your react, from Anthropic's most recent post on Project Glasswing (2nd June):
What each partner has in common is that a successful attack on their codebase could be catastrophic. For most partners, we estimate that a major attack could affect more than 100 million people, with important ramifications for both global and national security.
From their post announcing Fable 5 and Mythos 5 (9th June):
Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government, as an upgrade to Claude Mythos Preview.
The organisations in the Project Glasswing post seem to be organisations in tech and critical infrastructure and Mythos 5 seems to have been rolled out to the same organisations (maybe a more limited group). That doesn't sound like any of the organisations listed in the post.
Over time, there might be an increasingly large gap between insider model access and outsider model access. By insiders, I mean employees at the frontier lab.[1] By "outsiders", I mean external safety researchers, third-party auditors, and other actors trying to make the future go well. I will call this a model access gap — and when the gap is small, I'll call this model access parity.[2]
I think that one of the top priorities for the external AI safety community over the next 6-12 months should be ensuring model access parity. Main reasons:
To be clear: I think this is a big deal, it probably won't happen by default, and we are not on track to achieve it.
Which outsiders?
"Outsiders" includes the AI safety community outside the frontier labs, along with other actors trying to make the future go well. To make thing concrete, I've included a list below of orgs this might include.
I want to be clear that this list is intentionally expansive — it's not supposed to indicate which outsiders are top priority for model access parity. One might believe there are just 3-5 high-priority orgs, such that ensuring model access parity for these orgs would capture most of the value from a widespread model access parity. Alternatively, you might believe that the returns diminish slowly, and we should push for access for the entire list.
Examples of outsiders
Who aren’t outsiders?
What kinds of model access gap should we worry about?
Here are five mechanisms that could contribute to a model access gap. I've ordered them in decreasing order of how severe I expect them to be.
I'll discuss these in more detail below.
Non-release
As of late June, the most powerful internal models are not available to the public at any price. And (to my knowledge) they aren't available to almsot any of the outsiders I listed above. This is probably the main development in the strategic landscape during Q2 2026.
I won't go through the timeline of the Mythos saga here (not least because it's still on-going), but the main takeaway is that the government will make it increasingly difficult for labs to release models, due to a combination of (i) competitiveness concerns, (ii) security concerns, (iii) preferential treatment of the labs.
My best guess is that Mythos-level models will be released to the public in the coming weeks. But I think it's plausible that this is the final generation of models which are widely deployed. And I think it's more likely than not that superhuman AI researchers are never widely deployed until the labs acheive ASI.
Deployment lag
Even if models are eventually released, they might be available internally several months earlier. Before the Mythos saga, these lags hadn't produced much of a model access gap: the gaps between the models is quite small, so it's not much hindrance to use the best available public model for a few weeks.
But deployment lags may be more worrying in the future:
Below, I've added some numbers from AI Futures model. Note that, I expect this table will overestimate the uplift gap, because outsiders will focus on domains with worse uplift than software R&D.
Software R&D uplift
3 month lag
6 month
June 2028
2.8x (Automated Coder)
2x?
1.5x?
Feb 2029
84x (TED-AI)
9x
3.3x
May 2029
1700x (ASI)
84x
9x
Safeguards
As models get dangerous capabilities, labs will add more safeguards on model deployed publicly, e.g. refusal training, constitutional classifiers, etc. Especially if they are bound by RSPs, regulations, or reputational concerns. But the internal models will probably lack these safeguards. These safeguards might diminish the usefulness of the models.
It would be good if outsiders could make use of the best models without prohibitive safeguards. But historically, labs haven't been obliging about providing outsiders with access to non-public model access with diminished safeguards, e.g. helpful-only, deprecated models.
My best guess is that safeguards won't be a significant contributor to model access gap:
That said, it's not implausible that safeguards become biting. And potentially, in 6-12 months, safeguards become one of the biggest bottlenecks on outsiders getting work done.
Costs and rate limits
Maybe a model is technically available to the public, but outsiders can't run it at the workload insiders do, because they are constrained by costs or rate limits. This would contribute to a model access gap.
We should expect the market-rate of frontier models to rise: AIs will use up more GPUs, because of training-time scaling and inference-time scaling; and the opportunity cost for GPUs will increase.[4] See You are going to get priced out of the best AI coding tools (Daniel Paleka, 5th Nov 2025).
However, I think if labs provided the best internal models to outsiders with no markup, then I'm optimistic about the outsiders covering the rising costs. The outsiders will probably be able to spend billions of dollars during the ASI transition on making things go well. This is probably doable simply with philanthropic funding (supposing it's invested well). In particualr, if philanthropists expect to be spending most of their funding on compute, then they could invest in compute (and assets correlated with compute) as a hedge. If funders can’t cover the rising API costs, then we would need to lobby the labs or governments to subsidise compute, but this lobbying seems much more persuasive if we have the API access so we can demonstrate that the marginal utility of AI labour looks good.
Elicitation techniques (e.g. finetuning)
Lab employees probably have access to elicitation techniques like finetuning, RL training, scaffolding, etc. They'll use this to improve the models on the internal tasks, e.g. coding, architecture design, cybersecurity, etc. But if outsiders don't have access to the same elicitation techniques, then they may end up using a model that is less elicited for their use case.
I don't expect this to bite hard in practice. The internal models are presumably well-elicited for internal tasks, and I expect enough generalisation between the distribution of internal tasks and external tasks. So an external researcher with access to the internal model probably doesn’t get a big uplift boost from elicitation access.
It's possible outsiders want to do different kinds of work than the insiders (e.g. policy stuff, conceptual research, macrostrategy, hard-to-verify reasoning). Hence, the outsiders will enjoy less uplift than the insiders. But I don't expect that this capability limitation could be resolved simply by giving outsiders access to the same elicitation techniques as insiders. If this does occur, then I recommend outsiders simply construct benchmarks/evals/RL-tasks which they want labs to hill-climb.