Let's have more partial insiders.

Cleo Nardo

A lot of decisions in the AI safety ecosystem — e.g. which projects get funded, who works where, etc. — are shaped by a distinction between "inside the lab" and "outside the lab."

I think this is a useful distinction, but we may be led astray if we rely on it too strongly. It ignores partial insiders — non-employees who hold some, but not all, attributes of lab employees.

This matters because a lot of impactful work is done by partial insiders! So maybe your next project should be as a partial insider. And maybe we should push labs to make it easier to become one.

Attributes of insiders

Access to artefacts
1. Internals. Do you have weight access? Activation access? Gradient access? At which checkpoints?
2. Training data. Do you have the pre-training data? The reward models? The RL environments?
3. Metrics. Do you have access to benchmarks? Safety evaluations? Incident reports?
4. User data. Do you have access to the user conversations? What about high-level anonymised statistics?
5. Codebase. Do you have access to the code for experimenting, training, evaluating, and deploying?
Access to resources
1. Compute. Do you have free API credits or dedicated GPUs?
2. API limits. What are the API rate limits? Is usage logged or monitored? How stringent are the safeguards? Are there constitutional classifiers?
3. API surface. What kinds of requests can you make, and what kinds of responses can you get? Continuations? Prefill on the assistant turn? Log-probs? White-box techniques like deception probes? Finetuning?
4. Uplift. How much is your research accelerated? Can you assign tasks to most powerful AI agents? Can you elicit performance on your specific domain?
5. Infrastructure. Do you have access to internal research tooling?
Access to information
1. Techniques. Do you know what pretraining, post-training, and scaffolding the lab is using?
2. Roadmap. Do you know what's being trained next, what capabilities are being targeted, what the deployment plan is? Do you know how resources are allocated internally?
3. Conspiracies. If there were internal criminal activity, would you know?
Influence and trust
1. Bandwidth with leadership. How easy is it to talk to lab leadership? Can you contact the CEO directly? Are your views taken as informed and aligned, or as adversarial and uninformed?
2. Influence over decisions. Can you own the decision? Veto it? Argue for it? Write a memo someone might read? And for which kinds of decisions — training, release timing, deployment, public and government communications, hiring, team composition?
3. Write access. Can you modify the artefacts above — land PRs on the codebase, edit training data, change RL environments?
Constraints
1. Formal. Are you under NDAs or publication restrictions? Are you allowed to talk candidly to researchers at rival labs?
2. Financial. Do you hold equity? How tightly are your wealth, reputation, status, and prestige tied to the lab's commercial success?
3. Social. Would you feel awkward criticising the lab? Would you be invited to fewer parties?
4. Identity. Does the lab form part of your identity — do you call yourself "an ANT"?

(1) A lot of impactful work is done by partial insiders.

Apollo <> OpenAI — deliberative alignment. This required finetuning of o3 and o4-mini, access to chain-of-thought.
Ryan Greenblatt <> Anthropic — alignment faking. This required helpful-only Claude 3 Opus, RL training.
David Rein <> Anthropic — red-teaming their internal monitoring. This required knowledge of internal systems, including security.
And of course, there's third-party auditors like METR or UK AISI.

If you rely too heavily on the insider/outsider distinction, you might not pursue projects like this. You would follow, too easily, reasoning like: "This project seems best suited to insiders, so I won't do it". Instead, I would encourage reasoning like: "This project seems best suited to insiders because it requires attributes X, Y, and Z. How can I acquire those attributes as a non-employee?"

(2) The framing concedes too much to lab employees

I think the insider/outsider distinction implicitly concedes that the attributes I listed above are naturally restricted to lab employees. It concedes "only lab employees have access to the best internal models" and "only lab employees know information X."

But this coupling isn't a fact of nature. It's a choice made by specific people, responding to specific incentives. It has upside and downsides. It can be questioned, negotiated, rearranged. My best guess is that these attributes should be more decoupled, if we want things to go well.

I'll concede that decoupling these attributes would make security trickier to implement, because there the boundary between insiders and outsiders would be more complicated, however: (i) Many secure systems already distinguish many kinds of access, not just two. (ii) If I understand correctly, labs already operate with compartmentalised access among employees.

(3) You should maybe backchain from being a partial insider

It's plausible that a priority of your organisation should be making it easier to become a partial insider in the future. Sorted from highest returns to lowest:

Understand how these collaborations work. What access did partial insiders have, how did they get it, what did they have to concede, and where do negotiations typically stall? Most of this isn't written down; ask people who've done it.
Build connections. Figure out who'd be involved for the access you want, and make sure they have a good impression of you. Ask your friends at the labs to vouch for you.
Small-scale collaborations. These can work as proofs-of-concept, especially if they exercise the affordances you'll want later. For example, if you have promising results on open-weights models, ask a lab if you can replicate on their closed models.
Become the legible expert on a specific topic. You want labs to think "if we're doing X, we should loop in Y" — which requires X to be small enough that your org can own that entire concept.
Build and maintain a reputation for integrity. Distinct from expertise: this is whether labs trust you with sensitive access and to keep your commitments. More generally, be a good person to work with on a project.
Resolve predictable obstacles preemptively. Your org should have a healthy legal structure and governance. If you're an independent researcher, join or build an organisation. You also should invest preemptively in good security practices.

I wish this list could be more action-guiding, but the details will depend on your specific situation.

Notably absent from this list: "Stop criticising the labs." You shouldn't compromise on open criticism, and you should set a high bar for compromising on publication rights. Firstly, it's easy to overestimate how much open criticism actually blocks partial-insider access. Secondly, the central advantage of being a partial insider, rather than an employee, is perhaps the freedom to openly criticise the labs.

16

Let's have more partial insiders.

16

16

16