1 min read41 comments
This is a special post for quick takes by RobertM. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
41 comments, sorted by Click to highlight new comments since:

EDIT: I believe I've found the "plan" that Politico (and other news sources) managed to fail to link to, maybe because it doesn't seem to contain any affirmative commitments by the named companies to submit future models to pre-deployment testing by UK AISI.

I've seen a lot of takes (on Twitter) recently suggesting that OpenAI and Anthropic (and maybe some other companies) violated commitments they made to the UK's AISI about granting them access for e.g. predeployment testing of frontier models.  Is there any concrete evidence about what commitment was made, if any?  The only thing I've seen so far is a pretty ambiguous statement by Rishi Sunak, who might have had some incentive to claim more success than was warranted at the time.  If people are going to breathe down the necks of AGI labs about keeping to their commitments, they should be careful to only do it for commitments they've actually made, lest they weaken the relevant incentives.  (This is not meant to endorse AGI labs behaving in ways which cause strategic ambiguity about what commitments they've made; that is also bad.)

I haven't followed this in great detail, but I do remember hearing from many AI policy people (including people at the UKAISI) that such commitments had been made.

It's plausible to me that this was an example of "miscommunication" rather than "explicit lying." I hope someone who has followed this more closely provides details.

But note that I personally think that AGI labs have a responsibility to dispel widely-believed myths. It would shock me if OpenAI/Anthropic/Google DeepMind were not aware that people (including people in government) believed that they had made this commitment. If you know that a bunch of people think you committed to sending them your models, and your response is "well technically we never said that but let's just leave it ambiguous and then if we defect later we can just say we never committed", I still think it's fair for people to be disappointed in the labs.

(I do think this form of disappointment should not be conflated with "you explicitly said X and went back on it", though.)

I agree in principle that labs have the responsibility to dispel myths about what they're committed to. OTOH, in defense of the labs I imagine that this can be hard to do while you're in the middle of negotiations with various AISIs about what those commitments should look like.

I agree in principle that labs have the responsibility to dispel myths about what they're committed to

I don't know, this sounds weird. If people make stuff up about someone else and do so continually, in what sense it's that someone "responsibility" to rebut such things? I would agree with a weaker claim, something like: don't be ambiguous about your commitments with the objective of making it seem like you are committing to something and then walk back at the time you should make the commitment. 

Yeah fair point. I do think labs have some some nonzero amount of responsibility to be proactive about what others believe about their commitments. I agree it doesn't extend to 'rebut every random rumor'.

More discussion of this here. Really not sure what happened here, would love to see more reporting on it. 

Ah, does look like Zach beat me to the punch :)

I'm also still moderately confused, though I'm not that confused about labs not speaking up - if you're playing politics, then not throwing the PM under the bus seems like a reasonable thing to do.  Maybe there's a way to thread the needle of truthfully rebutting the accusations without calling the PM out, but idk.  Seems like it'd be difficult if you weren't either writing your own press release or working with a very friendly journalist.

Adding to the confusion: I've nonpublicly heard from people at UK AISI and [OpenAI or Anthropic] that the Politico piece is very wrong and DeepMind isn't the only lab doing pre-deployment sharing (and that it's hard to say more because info about not-yet-deployed models is secret). But no clarification on commitments.

Have you read this? https://www.politico.eu/article/rishi-sunak-ai-testing-tech-ai-safety-institute/

"“You can’t have these AI companies jumping through hoops in each and every single different jurisdiction, and from our point of view of course our principal relationship is with the U.S. AI Safety Institute,” Meta’s president of global affairs Nick Clegg — a former British deputy prime minister — told POLITICO on the sidelines of an event in London this month."

"OpenAI and Meta are set to roll out their next batch of AI models imminently. Yet neither has granted access to the U.K.’s AI Safety Institute to do pre-release testing, according to four people close to the matter."

"Leading AI firm Anthropic, which rolled out its latest batch of models in March, has yet to allow the U.K. institute to test its models pre-release, though co-founder Jack Clark told POLITICO it is working with the body on how pre-deployment testing by governments might work.

“Pre-deployment testing is a nice idea but very difficult to implement,” said Clark."


I hadn't, but I just did and nothing in the article seems to be responsive to what I wrote.

Amusingly, not a single news source I found reporting on the subject has managed to link to the "plan" that the involved parties (countries, companies, etc) agreed to.

Nothing in that summary affirmatively indicates that companies agreed to submit their future models to pre-deployment testing by the UK AISI.  One might even say that it seems carefully worded to avoid explicitly pinning the companies down like that.


Vaguely feeling like OpenAI might be moving away from GPT-N+1 release model, for some combination of "political/frog-boiling" reasons and "scaling actually hitting a wall" reasons.  Seems relevant to note, since in the worlds where they hadn't been drip-feeding people incremental releases of slight improvements over the original GPT-4 capabilities, and instead just dropped GPT-5 (and it was as much of an improvement over 4 as 4 was over 3, or close), that might have prompted people to do an explicit orientation step.  As it is, I expect less of that kind of orientation to happen.  (Though maybe I'm speaking too soon and they will drop GPT-5 on us at some point, and it'll still manage to be a step-function improvement over whatever the latest GPT-4* model is at that point.)

Eh, I think they'll drop GPT-4.5/5 at some point. It's just relatively natural for them to incrementally improve their existing model to ensure that users aren't tempted to switch to competitors.

It also allows them to avoid people being underwhelmed.

I would wait another year or so before getting much evidence on "scaling actually hitting a wall" (or until we have models that are known to have training runs with >30x GPT-4 effective compute), training and deploying massive models isn't that fast.

Yeah, I agree that it's too early to call it re: hitting a wall.  I also just realized that releasing 4o for free might be some evidence in favor of 4.5/5 dropping soon-ish.


Yeah. This prompts me to make a brief version of a post I'd had on my TODO list for awhile:

"In the 21st century, being quick and competent at 'orienting' is one of the most important skills." 

(in the OODA Loop sense, i.e. observe -> orient -> decide -> act)

We don't know exactly what's coming with AI or other technologies, we can make plans informed by our best-guesses, but we should be on the lookout for things that should prompt some kind of strategic orientation. @jacobjacob has helped prioritize noticing things like "LLMs are pretty soon going to be affect the strategic landscape, we should be ready to take advantage of the technology and/or respond to a world where other people are doing that."

I like Robert's comment here because it feels skillful at noticing a subtle thing that is happening, and promoting it to strategic attention. The object-level observation seems important and I hope people in the AI landscape get good at this sort of noticing.

It also feels kinda related to the original context of OODA-looping, which was about fighter pilots dogfighting. One of the skills was "get inside of the enemy's OODA loop and disrupt their ability to orient." If this were intentional on OpenAI's part (or part of subconscious strategy), it'd be a kinda clever attempt to disrupt our observation step.

Sam Altman and OpenAI have both said they are aiming for incremental releases/deployment for the primary purpose of allowing society to prepare and adapt. Opposed to, say, dropping large capabilities jumps out of the blue which surprise people. 

I think "They believe incremental release is safer because it promotes societal preparation" should certainly be in the hypothesis space for the reasons behind these actions, along with scaling slowing and frog-boiling. My guess is that it is more likely than both of those reasons (they have stated it as their reasoning multiple times; I don't think scaling is hitting a wall).

Yeah, "they're following their stated release strategy for the reasons they said motivated that strategy" also seems likely to share some responsibility.  (I might not think those reasons justify that release strategy, but that's a different argument.)


I wonder if that is actually a sound view though. I just started reading Like War (interesting and seems correct/on target so far but really just starting it). Given the subject area of impact, reaction and use of social media and networking technologies and the general results socially, seems like society generally is not really even yet prepared and adapted for that inovation. If all the fears about AI are even close to getting things right I suspect the "allowing society to prepare and adapt" suggests putting everything on hold, freezing in place, for at least a decade and probably longer.

Altman's and OpenAI's intentions might be towards that stated goal but I think they are basing that approach on how "the smartest people in the room" react to AI and not the general public, or the most opportinistic people in the room.

I'm not sure if you'd categorize this under "scaling actually hitting a wall" but the main possibility that feels relevant in my mind is that progress simply is incremental in this case, as a fact about the world, rather than being a strategic choice on behalf of OpenAI. When underlying progress is itself incremental, it makes sense to release frequent small updates. This is common in the software industry, and would not at all be surprising if what's often true for most software development holds for OpenAI as well.

(Though I also expect GPT-5 to be medium-sized jump, once it comes out.)

AI capabilities orgs and researchers are not undifferentiated frictionless spheres that will immediately relocate to e.g. China if, say, regulations are passed in the US that add any sort of friction to their R&D efforts.

The LessWrong editor has just been upgraded several major versions.  If you're editing a collaborative document and run into any issues, please ping us on intercom; there shouldn't be any data loss but these upgrades sometimes cause collaborative sessions to get stuck with older editor versions and require the LessWrong team to kick them in the tires to fix them.

I am pretty concerned that most of the public discussion about risk from e.g. the practice of open sourcing frontier models is focused on misuse risk (particular biorisk).  Misuse risk seems like it could be a real thing, but it's not where I see most of the negative EV, when it comes to open sourcing frontier models.  I also suspect that many people doing comms work which focuses on misuse risk are focusing on misuse risk in ways that are strongly disproportionate to how much of the negative EV they see coming from it, relative to all sources.

I think someone should write a summary post covering "why open-sourcing frontier models and AI capabilities more generally is -EV".  Key points to hit:

  • (1st order) directly accelerating capabilities research progress
  • (1st order) we haven't totally ruled out the possibility of hitting "sufficiently capable systems" which are at least possible in principle to use in +EV ways, but which if made public would immediately have someone point them at improving themselves and then we die.  (In fact, this is very approximately the mainline alignment plan of all 3 major AGI orgs.)
  • (2nd order) generic "draws in more money, more attention, more skilled talent, etc" which seems like it burns timelines

And, sure, misuse risks (which in practice might end up being a subset of the second bullet point, but not necessarily so).  But in reality, LLM-based misuse risks probably don't end up being x-risks, unless biology turns out to be so shockingly easy that a (relatively) dumb system can come up with something that gets ~everyone in one go.

Headline claim: time delay safes are probably much too expensive in human time costs to justify their benefits.

The largest pharmacy chains in the US, accounting for more than 50% of the prescription drug market[1][2], have been rolling out time delay safes (to prevent theft)[3].  Although I haven't confirmed that this is true across all chains and individual pharmacy locations, I believe these safes are used for all controlled substances.  These safes open ~5-10 minutes after being prompted.

There were >41 million prescriptions dispensed for adderall in the US in 2021[4].  (Note that likely means ~12x fewer people were prescribed adderall that year.)   Multiply that by 5 minutes and you get >200 million minutes, or >390 person-years, wasted.  Now, surely some of that time is partially recaptured by e.g. people doing their shopping while waiting, or by various other substitution effects.  But that's also just adderall!

Seems quite unlikely that this is on the efficient frontier of crime-prevention mechanisms, but alas, the stores aren't the ones (mostly) paying the costs imposed by their choices, here.

  1. ^


  2. ^


  3. ^


  4. ^


It seems like the technology you would want is one where you can get one Adderal box immediately but not all Adderal boxes that the store has at the premises.

Essentially, a big vending machine that might have 10 minutes to unlock to restock the vending machine but that can only give up one Adderal box per five minutes in its vending machine mode.

Now, surely some of that time is partially recaptured by e.g. people doing their shopping while waiting

That sounds like the technique might encourage customers to buy non-prescription medication in the pharmacy along with the prescription medicine they want to buy.

I think there might be many local improvements, but I'm pretty uncertain about important factors like elasticity of "demand" (for robbery) with respect to how much of a medication is available on demand.  i.e. how many fewer robberies do you get if you can get at most a single prescriptions' worth of some kind of controlled substance (and not necessarily any specific one), compared to "none" (the current situation) or "whatever the pharmacy has in stock" (not actually sure if this was the previous situation - maybe they had time delay safes for storing medication that wasn't filling a prescription, and just didn't store the filled prescriptions in the safes as well)?

It's not obvious to me why training LLMs on synthetic data produced by other LLMs wouldn't work (up to a point).  Under the model where LLMs are gradient-descending their way into learning algorithms that predict tokens that are generated by various expressions of causal structure in the universe, tokens produced by other LLMs don't seem redundant with respect to the data used to train those LLMs.  LLMs seem pretty different from most other things in the universe, including the data used to train them!  It would surprise me if the algorithms that LLMs developed to predict non-LLM tokens were perfectly suited for predicting other LLM tokens "for free".

Unfortunately, it looks like non-disparagement clauses aren't unheard of in general releases:


Release Agreements commonly include a “non-disparagement” clause – in which the employee agrees not to disparage “the Company.”


The release had a very broad definition of the company (including officers, directors, shareholders, etc.), but a fairly reasonable scope of the claims I was releasing. So far, so good. But then it included a general non-disparagement provision, which basically said I couldn’t say anything bad about the company, which, by itself, is also fairly typical and reasonable.

Given the way the contract is worded it might be worth checking whether executing your own "general release" (without a non-disparagement agreement in it) would be sufficient, but I'm not a lawyer and maybe you need the counterparty to agree to it for it to count.

And as a matter of industry practice, this is of course an extremely non-standard requirement for retaining vested equity (or equity-like instruments), whereas it's pretty common when receiving an additional severance package.  (Though even in those cases I haven't heard of any such non-disparagement agreement that was itself covered by a non-disclosure agreement... but would I have?)

If your model says that LLMs are unlikely to scale up to ASI, this is not sufficient for low p(doom).  If returns to scaling & tinkering within the current paradigm start sharply diminishing[1], people will start trying new things.  Some of them will eventually work.

  1. ^

    Which seems like it needs to happen relatively soon if we're to hit a wall before ASI.

Such a world could even be more dangerous. LLMs are steerable and relatively weak at consequentialist planning. There is AFAICT no fundamental reason why the next paradigm couldn't be even less interpretable, less steerable, and more capable of dangerous optimization at a given level of economic utility.

I have a pretty huge amount of uncertainty about the distribution of how hypothetical future paradigms score on those (and other) dimensions, but there does seem room for it to be worse, yeah.

ETA: (To be clear, something that looks relevantly like today's LLMs while still having superhuman scientific R&D capabilities seems quite scary and I think if we find ourselves there in, say, 5 years, then we're pretty fucked.  I don't want anyone to think that I'm particularly optimistic about the current paradigm's safety properties.)

Ah, yep, I read it at the time; this has just been on my mind lately and sometimes it bears to repeat the obvious.

Hypothetical autonomous researcher LLMs are 100x faster than humans, so such LLMs quickly improve over LLMs. That is, non-ASI LLMs may be the ones trying new things, as soon as they reach autonomous research capability.

The crux is then whether LLMs scale up to autonomous researchers (through mostly emergent ability, not requiring significantly novel scaffolding or post-training), not whether they scale up directly to ASI.

NDAs sure do seem extremely costly.  My current sense is that it's almost never worth signing one, or binding oneself to confidentiality in any similar way, for anything except narrowly-scoped technical domains (such as capabilities research).

Say more please.

As a recent example, from this article on the recent OpenAI kerfufle:

Two people familiar with the board’s thinking say that the members felt bound to silence by confidentiality constraints.

If you don't have more examples, I think 

  1. it is too early to draw conclusions from OpenAI
  2. one special case doesn't invalidate the concept

Not saying your point is wrong, just that this is not convincing me.

I have more examples, but unfortunately some of them I can't talk about.  A few random things that come to mind:

  • OpenPhil routinely requests that grantees not disclose that they've received an OpenPhil grant until OpenPhil publishes it themselves, which usually happens many months after the grant is disbursed.
  • Nearly every instance that I know of where EA leadership refused to comment on anything publicly post-FTX due to advice from legal counsel.
  • So many things about the Nonlinear situation.
  • Coordination Forum requiring attendees agree to confidentiality re: attendance and content of any conversations with people who wanted to attend but not have their attendance known to the wider world, like SBF, and also people in the AI policy space.

That explains why the NDAs are costly. But if you don't sign one, you can't e.g. get the OpenPhil grant. So the examples don't explain how "it's almost never worth signing one".

Not all of these are NDAs; my understanding is that the OpenPhil request comes along with the news of the grant (and isn't a contract).  Really my original shortform should've been a broader point about confidentiality/secrecy norms, but...

Reducing costs equally across the board in some domain is bad news in any situation where offense is favored. Reducing costs equally-in-expectation (but unpredictably, with high variance) can be bad even if offense isn't favored, since you might get unlucky and the payoffs aren't symmetrical.

(re: recent discourse on bio risks from misuse of future AI systems.  I don't know that I think those risks are particularly likely to materialize, and most of my expected disutility from AI progress doesn't come from that direction, but I've seen a bunch of arguments that seem to be skipping some steps when trying to argue that progress on ability to do generic biotech is positive EV.  To be fair, the arguments for why we should expect it to be negative EV are often also skipping those steps.  My point is that a convincing argument in either direction needs to justify its conclusion in more depth; the heuristics I reference above aren't strong enough to carry the argument.)

We have models that demonstrate superhuman performance in some domains without then taking over the world to optimize anything further. "When and why does this stop being safe" might be an interesting frame if you find yourself stuck.