There is also another benefit of working in a lab that is related to the difference between "off policy" and "on policy" reinforcement learning. Even if you had passive access to all the internal information in a lab, you do not gain from that as much as you do by being able to run your own experiments and learn from them. (Or make your own queries to people that have run experiments and learn from them.)
i think there are a few other effects worth taking into account. i don't make a strong claim as to whether you should change your mind about the overall importance
There's also the much more interesting category of "things being an ai lab employee makes you less likely to understand". For example, if you're sure you and your fellow employees have the secrets to <X capability>, you may disregard external research in that direction. Many incumbent companies were not ignorant of startup challengers, they just thought the startups irrelevant.
Notably, open weight models aren't that far behind closed weight models, despite the disadvantage of being trained with less compute and worse data.
I think this sentence is basically false, despite the preceding sentence being true, and the reasons for that is as Lisan al Gelb explains, closed models being better at inference compute usage + open source focusing much harder on coding/computer use and agentic tasks than closed source, meaning outside those areas open models have a much bigger gap to traverse + open source having much less capability and incentive to not directly train on the benchmark data, essentially having larger data contamination problems (this is more so of a cached belief based on a paper 3-4 years ago + Gavin Leech's paper on benchmarks being inflated because AI companies and open source fail to remove semantic duplicates of test data, and my suspicion is that frontier labs have much more incentive to address these sorts of problems than open source).
I agree with the view that AI companies don't have much secret sauce right now, and most of the reason why the frontier AI labs is outperforming open source so hard is mostly down to compute + having stronger incentives to actually not train on the test set.
Is there a reason why alignment techniques (and specifically character training) are so closely guarded? A priori I would have thought they wouldn't be particularly guarded since 1) they seem less commercially sensitive than techniques which directly increase capabilities 2) there is probably a stronger moral reason to publish alignment work 3) I would have expected safety researchers to be more closely linked with external orgs than capabilities researchers.
If people want to push labs to be more transparent it seems like pushing for transparency on this would be the obvious place to start and has the potential for significant upside.
My understanding is that character training's already fairly open. OpenAI writes about their model spec; Anthropic writes about personas/characters as of recent, and worked on an open implementation
This post was drafted by Buck, and substantially edited by Anders. "I" refers to Buck. Thanks to Alex Mallen for comments.
People who work inside AI companies get access to information that I only get later or never. Quantitatively, how big a deal is this access?
Here’s an operationalization of this. Consider the following two ways my knowledge could be augmented:
How big would n have to be for me to be indifferent between these two options, from the perspective of learning things that are helpful for making AI go well?
The answer is presumably different for me than for many readers, because I’m a reasonably well-connected researcher; I see published information and news from the rumor mill and I talk to researchers at frontier AI companies all the time. (Researchers I know through AI safety usually only tell me information that their employer would approve of, but other researchers occasionally spontaneously tell me things that seem like leaks of important proprietary information.)
My overall guess is that access to private information from an AI company would currently be about as helpful as access to all semi-public information (including information from the rumor mill) related to AI from 2.5 months in the future. This is similar to the median view of AI company staff I've asked about this. I'd enjoy it if someone did a proper survey on this.
In general, information can be relevant to me for improving my understanding of things like:
I’ll assess this by thinking about three areas of knowledge that might be relevant to safety: safety research and its application, model capabilities, and algorithmic and architectural advancement. For each of these, I’ll estimate how much extra information AI company insiders get.
It’s worth noting that as things move faster (during an intelligence explosion), two months of information might represent far more information, so it’s not a constant-sized yardstick. The time delay itself might also grow or shrink over time, as I’ll discuss at the end.
Of course, there are other advantages of working at frontier companies, like access to the newest models to accelerate research. In this post I'll just discuss the information advantages.
I'm not very sure about my bottom line here. I'd love to hear people's thoughts on whether I'm missing major considerations.
What do insiders know?
AI company employees don't have access to all risk-relevant proprietary information that exists.
As I noted above, I talk a lot to AI company staff who want me to be well informed without violating their employer’s trust, so I hear a reasonable amount of stuff that is not widely known but also not an important commercial secret.
So, the main informational advantage of employees is information that is commercially relevant or otherwise sensitive. The core question is how important that stuff is.
Safety work and corporate attitudes
AI company employees know much more about techniques for alignment training on current models, the alignment issues that tend to come up, and how those issues are addressed. They also know more about how misuse is prevented. I think this is the most important type of information that employees have much better access to.
One central example is character training, where very little is publicly known about the implementation at frontier companies. The particular details of how safety training is combined with capabilities RL might be pretty important for ensuring AI goes well, for example, in modeling threats from scheming AIs.
Another important thing is how organizations work internally and how trustworthy different people are. It would be useful to know how people react to evidence of misalignment and if they’re likely to make good decisions under pressure. This is pretty hard to assess as an outsider, but it's not actually secret. (This is also particularly difficult to generalize across companies.)
Some company employees I know have moderately changed their views on misalignment risk based on insider information about how models are trained and how the companies address alignment issues, though other researchers at frontier AI companies report that the proprietary information isn't that big an update.
Model capabilities
An important, but secondary, type of insider knowledge is detailed knowledge of model capabilities. Insiders can use models or get information about them before they’re externally announced. This is something where I usually don't get private info.
There are a few big examples of this:
These days, though, AI companies tend to publicly deploy their models fairly quickly after they finish training and evaluations, so I don't think AI company staff actually get information on capabilities much faster than external people.[2] I think this is largely caused by mounting competitive pressures between AI companies.
And there’s examples of people within AI companies being surprised about model capabilities or the public reaction to them. The ChatGPT moment seems to have surprised everyone, including the people who'd worked on it, and people like me who had already talked to LLM-powered chatbots. Deepseek v3 was also much better than most people (including employees of US AI companies) seem to have expected.
Overall, it seems like employees might be tipped off early about some capabilities, but there’s lots of things they don’t know, and many things are public anyway.
Algorithms and architecture
I think technical advances in architecture and algorithms have historically been the least important area of insider knowledge.
To start with, I'll note that I don't think that AI companies have that much secret sauce. Notably, open weight models aren't that far behind closed weight models, despite the disadvantage of being trained with less compute and worse data.[3] Open-weight models tell us about many of the aspects of training that we might care about, like RL. (Open-weight model developers still keep some things secret, like data mixes, but I don’t think this is very important.)
Even if AI companies have big algorithmic/architectural secrets, it's not clear that these are safety-relevant. For example, I think none of the publicly known architecture changes since GPT-1 are important for understanding AI alignment, though some are mildly important for forecasting questions about the economics of AI in the future. More generally, it seems like most of what we know about misalignment risk doesn’t depend on details of AI training like hyperparameters, RL algorithms, or architectures (at least among the current distribution of transformer-like architectures).
There are a few cases of algorithmic innovations that do matter for safety:
All three of these are now pretty publicly visible, and we probably know most of their safety-relevant features. There might be a similarly important algorithmic secret in the future, but I don’t expect algorithmic and architectural advances to be a big source of safety-relevant inside information.
I do wish that I knew more about how reasoning models were trained. In particular, knowing more about the training of state-of-the-art models would tell me more about the extent to which there is pressure on the CoT because of spillover from text in the output field.
How will this change over time?
In the short term, employees are likely to have their information advantage shrink, but once we get near the intelligence explosion, this information advantage might become much larger.
Right now, employees enjoy an information advantage from being the first to know about new models. In the near term, this advantage is likely to shrink as competitive pressures push AI companies to deploy faster. OpenAI's lead from the ChatGPT moment has shrunk considerably, and they probably can't afford to sit on a leading model for seven months anymore.
However, this dynamic might reverse if one company pulls ahead because, for example, their AIs are speeding up their R&D. If they’re confident that they’re solidly ahead, the leading company would be less pressured to release publicly, leading to more divergence between the externally released products and the products for internal use on AI R&D.
Even if companies quickly release products to the public, outsiders might not get safety-relevant information. If AI companies release increasingly high-level products (like DeepResearch, where you aren't quite sure what the model is doing), we would learn less about how the underlying models work. The AI companies are probably incentivized to do this because applications have higher margins. AI companies are already doing this by releasing Codex, Claude Code, and browser agents; Sam Altman has made statements about developing entire applications rather than just APIs.
There might be countervailing pressures for transparency, like regulation and employee demands, especially after a visible incident. Daniel Kokotajlo proposes radical transparency as one example of what this might look like. Extreme versions of this require political will that doesn’t currently exist. Weaker versions of this, e.g. SB-53's mandate that companies release reports every three months to the California Office of Emergency Services on risks from internally deployed models, already exist; there are plausible intermediate levels of transparency where I would learn some of the proprietary information I'd like to know.
Overall, it seems like the leading AI company will be increasingly opaque. During an intelligence explosion, model releases might be delayed and gossip might dry up. This is particularly concerning because information will probably matter more during the intelligence explosion than it does now. As AI progress accelerates, being two months behind might make much of your research irrelevant. It might also mean public input comes after safety or governance decisions are already irreversible.
Conclusion
In the introduction, I estimated that being in an AI company today is roughly equivalent to knowing what I'll know in 2.5 months. This is pretty small because I think companies don’t actually have that much privileged information that’s safety-relevant.
There is relatively little information (as far as we know) that was widely known across frontier companies without being public. So, employees may only enjoy an information advantage regarding their own company, and the value of this depends greatly on the particular company.
Even though there are a few instances of important secrets that one company had, like Chinchilla scaling laws and early CoT RL, I suspect there are fewer such secrets currently, which makes the estimated information advantage smaller.
At the moment I think I get enough knowledge as an outsider to make reasonable decisions. But this is likely to change. When information matters most, AI companies might also be at their most secretive.
“Q*” existed at least since November 2023, and o1 wasn’t released until September 2024.
Although Anthropic hasn’t released Mythos (and doesn’t plan to), it released evaluation results one month after the model was internally deployed. So external researchers still have some idea of Mythos’ capabilities.
Open-weight models were probably trained with substantial amounts of distillation, which means that looking at just the capability gap underestimates how far behind the algorithms are.
For example, Josh Clymer’s scenario partially depends on large capabilities gains from increased RL.