LESSWRONG
LW

182
yams
131551440
Message
Dialogue
Subscribe

MIRI, formerly MATS, sometimes Palisade

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
2yams's Shortform
1y
49
Some of the ways the IABIED plan can backfire
yams15m20

For most, LLMs are the salient threat vector at this time, and the practical recommendations in the book are toward that. You did not say in your post ‘I believe that brain-in-a-box is the true concern; the book’s recommendations don’t work for this, because Chapter 13 is mostly about LLMs.’ That would be a different post (and redundant with a bunch of Steve Byrnes stuff).

Instead, you completely buried the lede and made a post inviting people to talk in circles with you unless they magically divined your true objection (which is only distantly related to the topic of the post). That does not look like a good faith attempt to get people on the same page.

Reply
Some of the ways the IABIED plan can backfire
yams1h20

I think targeting specific policy points rather than highlighting your actual crux makes this worse, not better

Reply
Some of the ways the IABIED plan can backfire
yams4h20

I think positions here come out on four quadrants (but is of course spectral), based on how likely you think Doom is and how easy or hard (that is: resource intensive) you expect ASI development to be.

ASI Easy/Doom Very Unlikely: Plan obviously backfires; you could have had nice things, but were too cautious!

ASI Hard/Doom Very Unlikely: Unlikely to backfire, but you might have been better off pressing ahead, because there was nothing to worry about anyway.

ASI Easy/Doom Very Likely: We're kinda fucked anyway in this world, so I'd want to have pretty high confidence it's the world we're in before attempting any plan optimized for it. But yes, here it looks like the plan backfires (in that we're selecting even harder than in default-world for power-seeking, willingness to break norms, and non-transparency in coordinating around who gets to build it). My guess is this is the world you think we're in. I think this is irresponsibly fatalistic and also unlikely, but I don't think it matters to get into it here.

ASI Hard/Doom Very Likely: Plan plausibly works.

I expect near-term ASI development to be resource-intensive, or to rely on not-yet-complete resource-intensive research. I remain concerned about the brain-in-a-box scenarios, but it's not obvious to me that they're much more pressing in 2025 than they were in 2020, except in ways that are downstream of LLM development (I haven't looked super-close), which is more tractable to coordinate action around anyway, and that action plausibly leads to decreased risk on the margin even if the principle threat is from a brain-in-a-box. I assume you disagree with all of this.

I think your post is just aimed at a completely different threat model than the book even attempts to address, and I think you would be having more of the discussion you want to have if you opened by talking explicitly about your actual crux (which you seemed to know was the real crux ahead of time), than to incite an object-level discussion colored by such a powerful background disagreement. As-is, it feels like you just disagree with the way the book is scoped, and would rather talk to Eliezer about brain-in-a-box than talk to William about the tenative-proposals-to-solve-a-very-different-problem.

Reply
Some of the ways the IABIED plan can backfire
yams8h21

I take 'backfire' to mean 'get more of the thing you don't want than you would otherwise, as a direct result of your attempt to get less of it.' If you mean it some other way, then the rest of my comment isn't really useful.

  1. Change of the winner
    1. Secret projects under the moratorium are definitely on the list of things to watch out for, and the tech gov team at MIRI has a huge suite of countermeasures they're considering for this, some of which are sketched out or gestured toward here.
    2. It actually seems to me that an underground project is more likely under the current regime, because there aren't really any meaningful controls in place (you might even consider DeepSeek just such a project, given that there's some evidence [I really don't know what I believe here and doesn't seem useful to argue; just using them as an example here] that they stole IP and smuggled chips).
    3. The better your moratorium is, the less likely you are to get wrecked by a secret project (because the fewer resources they'll be able to gather) before you can satisfy your exit conditions.
    4. So p(undergroundProjectExists) goes down as a result of moratorium legislation, but p(undergroundProjectWins) may go up if your moratorium sucks. (I actually think this is still pretty unclear, owing to the shape of the classified research ecosystem, which I talk more about below.)
    5. This is, imo, your strongest point, and is a principal difference between the various MIRI plans and plans of other people we talk to ("Once you get the moratorium, do you suppose there must be a secret project and resolve to race against them?" MIRI answers no; some others answre yes.)
  2. Intensified race...
    1. You say: "a number of AI orgs would view the threat of prohibition on par with the threat of a competitor winning". I don't think this effect is stronger in the moratorium case than in the 'we are losing the race and believe the finish line is near' case, and this kind of behavior sooner is better (if we don't expect the safety situation to improve), because the systems themselves are less powerful, the risks aren't as big, the financial stakes aren't as palpable, etc. I agree with something like "looming prohibition will cause some reckless action to happen sooner than it would otherwise", but not with the stronger claim that this action would be created by the prohibition.
    2. I also think the threat of a moratorium could cause companies to behave more sanely in various ways, so that they're not caught on the wrong side of the law in the future worlds where some 'moderate' position wins the political debate. I don't think voluntary commitments are trustworthy/sufficient, but I could absolutely see RSPs growing teeth as a way for companies to generate evidence of effective self-regulation, then deploy that evidence to argue against the necessity of a moratorium.
    3. It's just really not clear to me how this set of interrelated effects would net out, much less that it's an obvious way pushing through a moratorium might backfire. My best guess is that these cooling effects pre-moratorium basically win out and compresses the period of greatest concern, while also reducing its intensity.
  3. Various impairments for AI safety research
    1. Huge amounts of classified research exists. There are entire parallel academic ecosystems for folks working on military and military-adjacent technologies. These include work on game theory, category theory, Conway's Game of Life, genetics, corporate governance structures, and other relatively-esoteric things beyond 'how make bomb go boom'. Scientists in ~every field and mathematicians in any branch of the discipline, can become DARPA PMs, and access to (some portion of) this separate academic canon is considered a central perk of the job. I expect gaining the ability to work on the classified parts of AI safety under a moratorium will be similarly difficult to qualifying for work at Los Alamos and the like.
    2. As others have pointed out, not all kinds of research would need to be classified-by-default, under the plan. Mostly this would be stuff regarding architecture, algorithms, and hardware.
    3. There are scarier worlds where you would want to classify more of the research, and there are reasonable disagreements about what should/shouldn't be classified, but even then, you're in a Los Alamos situation, and not in a Butlerian Jihad.
Reply
IABIED Review - An Unfortunate Miss
yams1d20

Most of these people claim to be speaking from their impression of how the public will respond, which is not yet knowable and will be known in the (near-ish) future.

My meta point remains that these are all marginal calls, that there are arguments the other direction, and that only Nate is equipped to argue them on the margin (because, in many cases, I disagree with Nate’s calls, but don’t think I’m right about literally all the things we disagree on; the same is true for everyone else at MIRI who’s been involved with the project, afaict). Eg I did not like the scenario, and felt Part 3 could have been improved by additional input from the technical governance team (and more detailed plans, which ended up in the online resources instead). It is unreasonable that I have been dragged into arguing against claims I basically agree with on account of introducing a single fact to the discussion (that length DOES matter, even among ‘elite’ audiences, and that thresholds for this may be low). My locally valid point and differing conclusions do not indicate that I disagree with you on your many other points.

That people wishing the book well are also releasing essays (based on guesses and, much less so in your case than others, misrepresentations) to talk others in the ecosystem out of promoting it could, in fact, be a big problem, mostly in that it could bring about a lukewarm overall reception (eg random normie-adjacent CEA employees don’t read it and don’t recommend it to their parents, because they believe the misrepresentations from Zach’s tweet thread here: https://x.com/Zach_y_robinson/status/1968810665973530781). Once that happens, Zach can say “well, nobody else at my workplace thought it was good,” when none of them read it, and HE didn’t read it, AND they just took his word for it.

I could agree with every one of your object level points, still think the book was net positive, and therefore think it was overconfident and self-fulfillingly nihilistic of you to aithoritatively predict how the public would respond.

I, of course, wouldn’t stand by the book if I didn’t think it was net positive, and hadn’t spent tens of hours hearing the other side out in advance of the release. Part I shines VERY bright in my eyes, and the other sections are, at least, better than similarly high-profile works (to the extent that those exist at all) tackling the same topics (exception for AI2027 vs Part 2).

Reply
IABIED Review - An Unfortunate Miss
yams1d20

I am not arguing about the optimal balance and see no value in doing so. I am adding anecdata to the pile that there are strong effects once you near particular thresholds, and it’s easy to underrate these.

In general I don’t understand why you continue to think such a large number of calls are obvious, or imagine that the entire MIRI team, and ~100 people outside of it, thinking, reading,  and drafting for many months, might not have weighed such thoughts as ‘perhaps the scenario ought to be shorter.’ Obviously these are all just margin calls; we don’t have many heuristic disagreements, and nothing you’ve said is the dunk you seem to think it is.

Ultimately Nate mostly made the calls once considerations were surfaced; if you’re talking to anyone other than him about the length of the scenario, you’re just barking up the wrong tree.

More on how I’m feeling in general here (some redundancies with our previous exchanges, but some new):

https://www.lesswrong.com/posts/3GbM9hmyJqn4LNXrG/yams-s-shortform?commentId=yjnTtbyotTbEnXqa9

Reply
IABIED Review - An Unfortunate Miss
yams2d20

I’ve met a large number of people who read books professionally (humanities researchers) who outright refuse to read any book >300 pages in length.

Reply
IABIED Review - An Unfortunate Miss
yams3d40

Can’t discuss too much about current sales numbers, mostly because nobody really has numbers that are very up to date, but I was starting with a similar baseline for community sales, and then subtracting that from our current floor estimate to suggest there’s a chance it’s getting traction; a second wave will be more telling, the conversation will be more telling, but the first filter is ‘get it in people’s hands’, and so we at least have a chance to see how those other steps will go.

In both this and other reviews, people have their theory of What Will Work. Darren McKee writing a book (unfortunately) does not appear to have worked (for reasons that don’t necessarily have anything to do with the book’s quality, or even with Darren’s sense of what works for the public; I haven’t read it). Nate and Eliezer wrote a book, and we will get feedback on how well that works in the near future (independent of anyone’s subjective sense of what the public responds to, which seems to be a crux for many of the negative reviews on LW).

I’m just highlighting that we all have guesses about what works here, but they are in fact guesses, and most of what this review tells me is ‘Darren’s guess is different from Nate’s’, and not ‘Nate was wrong.’ That some people agree with you would be some evidence, if we didn’t already strongly predict that a bunch of people would have takes like this.

Reply
IABIED Review - An Unfortunate Miss
yams3d30

I think the text is meaningfully more general-audience friendly than much of the authors’ previous writing.

It could still be true that it doesn’t go far enough in that direction, but I’m excited to watch the experiment play out (eg it looks like we’re competitive for the Times list rn, and that requires some 4-figure number of sales beyond the bounds of the community, which isn’t enough that I’m over the moon, given the importance of the issue, but is some sign that it may be too early in the game to say definitively whether or not general audiences are taking to the work).

Reply
yams's Shortform
yams4d20

Following up to say that the thing that maps most closely to what I was thinking about (or satisfied my curiosity) is GWT.

GWT is usually intended to approach the hard problem, but the principle critique of it is that it isn't doing that at all (I ~agree). Unfortunately, I had dozens of frustrating conversations with people telling me 'don't spend any time thinking about consciousness; it's a dead end; you're talking about the hard problem; that triggers me; STOP' before someone actually pointed me in the right direction here, or seemed open to the question at all.

Reply
Load More
311The Problem
1mo
217
113If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)
2mo
10
85If Anyone Builds It, Everyone Dies: Advertisement design competition
3mo
37
46Existing Safety Frameworks Imply Unreasonable Confidence
5mo
3
10[Job Ad] MATS is hiring!
1y
0
62MATS Alumni Impact Analysis
1y
7
2yams's Shortform
1y
49
121Talent Needs of Technical AI Safety Teams
1y
65