LESSWRONG
LW

1968
yams
131551380
Message
Dialogue
Subscribe

MIRI, formerly MATS, sometimes Palisade

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
2yams's Shortform
1y
49
IABIED Review - An Unfortunate Miss
yams7h20

I’ve met a large number of people who read books professionally (humanities researchers) who outright refuse to read any book >300 pages in length.

Reply
IABIED Review - An Unfortunate Miss
yams2d40

Can’t discuss too much about current sales numbers, mostly because nobody really has numbers that are very up to date, but I was starting with a similar baseline for community sales, and then subtracting that from our current floor estimate to suggest there’s a chance it’s getting traction; a second wave will be more telling, the conversation will be more telling, but the first filter is ‘get it in people’s hands’, and so we at least have a chance to see how those other steps will go.

In both this and other reviews, people have their theory of What Will Work. Darren McKee writing a book (unfortunately) does not appear to have worked (for reasons that don’t necessarily have anything to do with the book’s quality, or even with Darren’s sense of what works for the public; I haven’t read it). Nate and Eliezer wrote a book, and we will get feedback on how well that works in the near future (independent of anyone’s subjective sense of what the public responds to, which seems to be a crux for many of the negative reviews on LW).

I’m just highlighting that we all have guesses about what works here, but they are in fact guesses, and most of what this review tells me is ‘Darren’s guess is different from Nate’s’, and not ‘Nate was wrong.’ That some people agree with you would be some evidence, if we didn’t already strongly predict that a bunch of people would have takes like this.

Reply
IABIED Review - An Unfortunate Miss
yams2d30

I think the text is meaningfully more general-audience friendly than much of the authors’ previous writing.

It could still be true that it doesn’t go far enough in that direction, but I’m excited to watch the experiment play out (eg it looks like we’re competitive for the Times list rn, and that requires some 4-figure number of sales beyond the bounds of the community, which isn’t enough that I’m over the moon, given the importance of the issue, but is some sign that it may be too early in the game to say definitively whether or not general audiences are taking to the work).

Reply
yams's Shortform
yams3d20

Following up to say that the thing that maps most closely to what I was thinking about (or satisfied my curiosity) is GWT.

GWT is usually intended to approach the hard problem, but the principle critique of it is that it isn't doing that at all (I ~agree). Unfortunately, I had dozens of frustrating conversations with people telling me 'don't spend any time thinking about consciousness; it's a dead end; you're talking about the hard problem; that triggers me; STOP' before someone actually pointed me in the right direction here, or seemed open to the question at all.

Reply
yams's Shortform
yams3d30

Reading so many reviews/responses to IABIED, I wish more people had registered how they expected to feel about the book, or how they think a book on x-risk ought to look, prior to the book's release. 

Finalizing any Real Actual Object requires making tradeoffs. I think it's pretty easy to critique the book on a level of abstraction that respects what it is Trying To Be in only the broadest possible terms, rather than acknowledging various sub-goals (e.g. providing an updated version of Nate + Eliezer's now very old 'canonical' arguments), modulations of the broader goal (e.g. avoiding making strong claims about timelines, knowing this might hamstring the urgency of the message), and constraints (e.g. going through an accelerated version of the traditional publishing timeline, which means the text predates Anthropic's Agentic Misalignment and, I'm sure, various other important recent findings).

A lot of the takes I see seem to come from a place of defending the ideal version of such a text by the lights of the reviewer, but it's actually unclear to me whether many of these reviewers would have made the opposite critiques if the book had made the opposite call on the various tradeoffs. I don't mean to say I think these reviewers are acting in bad faith; I just think it's easy to avoid confronting how your ideal version couldn't possibly be realized, and make post-hoc adjustments to that ideal thing in service of some (genuine, worthwhile) critique of the Real Thing.

Previously, it annoyed me that people had pre-judged the book's contents. Now, I'm grateful to folks who wrote about it, or talked to me about it, before they read it (Buck Shlegeris, Nina Panickserry, a few others), because I can judge the consistency of their rubric myself, rather than just feeling:

Yes, this came up during drafting, but there was a reasonable tradeoff. We won't know if that was a good call until later. If I had more energy I'd go 20 comments deep with you, and you'd probably agree it was a reasonable call by the end, but still think it was incorrect, and we'd agree to let time tell.

Which is the feeling that's overtaken me as I read the various reviews from folks throughout the community.

I should say: I'm grateful for all the conversation, including the dissent, because it's all data, but it is worse data than it would have been if you'd taken it upon yourself to cause a fuss in one of the many LW posts made in the lead-up to the book (and in the future I will be less rude to people who do this, because actually, it's a kindness!).

Reply
A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”
yams4d120

Since there's speculation about advance copies in this thread, and I was involved in a fair deal of that, I'll say some words on the general criteria for advance copies:

  1. Some double-digit number of people were solicited for comments during the drafting stage.
  2. Some low three-digit number of people (my guess is '200') were solicited for blurbs, often receiving an advance copy. Most of these people simply did not respond. These were, broadly:
    1. People who have blurbed similar books (e.g. Life 3.0, The Precipice, etc)
    2. Random influential people we already knew (both within AI safety and without; Grimes goes in this category, for those wondering)
    3. Random influential people we thought we could get a line to through some other route (friends of friends, colleagues of colleagues, people whose email the publicist had, people whose contact information is ~public), who seemed ~able to be convinced
  3. Journalists, content creators, podcasters, and other people in a position to amplify the book by talking about it often received advance copies, since you have to get all your press lined up well ahead of release, and they usually want to read the book (or pay someone else to read it, in many cases), before agreeing to have you on. My guess is this was about 100 copies.

We didn't want to empower people who seemed to be at some risk of taking action to deflate the project ahead of release, and so had a pretty high bar for sharing there. We especially wouldn't share a copy with someone if we thought there was a significant chance the principal effect of doing so was early and authoritative deflation to a deferential audience who could not yet read the book themselves. This is because we wanted people to read the book, rather than relying on others for their views.

I agree with the person Eli's quoting that this introduces some selection bias in the early stages of the discourse. However, I will say that the vast majority of parties we shared advance copies with were, by default, neutral, toward us, either having never heard of us before, or having googled Eliezer and realized it might be worth writing/talking about. There was, to my knowledge, no deliberate campaign to seed the discourse, and many of our friends and allies who we had opportunity to score cheap social points with by sharing on advance copies did not receive them. Journalists do not agree in advance to cover you positively, and we've seen that several who were given access to early copies indeed covered the book negatively (e.g. NYT and Wired — two of the highest profile things that will happen around the book at all).

[the thing is happening where I put a number of words into this that is disproportionate to my feelings or the importance; I don't take anyone in this thread to be making accusations or being aggressive/unfair. I just see an opportunity to add value through a little bit of transparency.]

Reply1
yams's Shortform
yams9d704

MIRI is potentially interested in supporting reading groups for If Anyone Builds It, Everyone Dies by offering study questions, facilitation, and / or copies of the book, at our discretion. If you lead a pre-existing reading group of some kind (or meetup group that occasionally reads things together), please fill out this form.

The deadline to submit is September 22, but sooner is much better.

Reply
Mikhail Samin's Shortform
yams12d86

As Mikhail said, I feel great empathy and respect for these people. My first instinct was similar to yours, though -  if you’re not willing to die, it won’t work, and you probably shouldn’t be willing to die (because that also won’t work / there are more reliable ways to contribute / timelines uncertainty).

I think ‘I’m doing this to get others to join in’ is a pretty weak response to this rebuttal. If they’re also not willing to die, then it still won’t work, and if they are, you’ve wrangled them in at more risk than you’re willing to take on yourself, which is pretty bad (and again, it probably still won’t work even if a dozen people are willing to die on the steps of the DeepMind office, because the government will intervene, or they’ll be painted as loons, or the attention will never materialize and their ardor will wain).

I’m pretty confused about how, under any reasonable analysis, this could come out looking positive EV. Most of these extreme forms of protest just don’t work in America (e.g. the soldier who self-immolated a few years ago). And if it’s not intended to be extreme, they’ve (I presume accidentally) misbranded their actions. 

Reply21
The Problem
yams1mo82

[low-confidence appraisal of ancestral dispute, stretching myself to try to locate the upstream thing in accordance with my own intuitions, not looking to forward one position or the other]

I think the disagreement may be whether or not these things can be responsibly decomposed. 

A: "There is some future system that can take over the world/kill us all; that is the kind of system we're worried about."

B: "We can decompose the properties of that system, and then talk about different times at which those capabilities will arrive."

A: "The system that can take over the world, by virtue of being able to take over the world, is a different class of object from systems that have some reagents necessary for taking over the world. It's the confluence of the properties of scheming and capabilities, definitionally, that we find concerning, and we expect super-scheming to be a separate phenomenon from the mundane scheming we may be able to gather evidence about."

B: "That seems tautological; you're saying that the important property of a system that can kill you is that it can kill you, which dismisses, a priori, any causal analysis."

A: "There are still any-handles-at-all here, just not ones that rely on decomposing kill-you-ness into component parts which we expect to be mutually transformative at scale."

I feel strongly enough about engagement on this one that I'll explicitly request it from @Buck and/or @ryan_greenblatt. Thank y'all a ton for your participation so far!

Reply
The Problem
yams1mo30

This rhymes with what Paul Christiano and his various interlocutors (e.g. Buck and Ryan above) think, but I think you've put forward a much weaker version of it than they do.

This deployment of the word 'unproven' feels like a selective call for rigor, in line with the sort of thing Casper, Krueger, and Hadfield-Menell critique here. Nothing is 'proven' with respect to future systems; one merely presents arguments, and this post is a series of arguments toward the conclusion that alignment is a real, unsolved problem that does not go well by default.

"Lay low until you are incredibly sure you can destroy humanity" is definitionally not a risky plan (because you're incredibly sure you can destroy humanity, and you're a superintelligence!). You have to weaken incredibly sure, or be talking about non-superintelligent systems, for this to go through.

The open question for me is not whether it at some point could, but how likely it is that it will want to.

What does that mean? Consistently behaving such that you achieve a given end is our operationalization of 'wanting' that end. If future AIs consistently behave such that "significant power goes away from humans to ASI at some point", this is consistent with our operationalization of 'want'.

Reply
Load More
311The Problem
1mo
217
113If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)
2mo
10
85If Anyone Builds It, Everyone Dies: Advertisement design competition
3mo
37
46Existing Safety Frameworks Imply Unreasonable Confidence
5mo
3
10[Job Ad] MATS is hiring!
1y
0
62MATS Alumni Impact Analysis
1y
7
2yams's Shortform
1y
49
121Talent Needs of Technical AI Safety Teams
1y
65