Edit 2 - PARTIAL RESULT:

Yudkowsky's ted talk seems to already integrate what I was hoping to figure out how to convey with this discussion! Yay! Looking forward to when it's posted on youtube.

Edits:

  1. This is a bad post/question. It should not be highly upvoted. It was never intended or expected to be otherwise.
  2. This is a request for comments and debate. I don't feel we went back and forth usefully in figuring out what it is I'm concerned about about his approach; there are lots of object level parts of yud's views which I agree with, but those aren't the problem I'm worried about, the problem is he's participating in causing <?social autoimmune inflammation or something?>. Perhaps I'm looking for https://www.lesswrong.com/posts/KYzHzqtfnTKmJXNXg/the-toxoplasma-of-agi-doom-and-capabilities?
  3. I found another post which makes a similar point and reposted it here, which I think is a much better post than this request
  4. I added the [] to the title, also surrounded a couple of confused concepts with <??>
  5. The reply post that calls this post out for being badly written is totally right. Yup, this post is bad! Don't upvote it a lot! I have a good point to make, and I haven't made it, and don't yet totally know what it is.
  6. Thanks for reading, sorry about all the conflictons in my thinking and writing here. Help? :[
  7. bonus, maybe for a later post, maybe for this one: I'm pretty sure the "guess the definition" meaning of conflicton is already insightful, but I'm not quite sure how to formalize what I'm trying to say with the word. I spent some time trying out words and talking to language models to find one that already mostly means the right thing, the quanta of conflict, but I don't know quite exactly what the quanta of conflict actually is yet in a mechanistic or type signature sense.

original blurb:

https://mobile.twitter.com/QuintinPope5/status/1642100668126355456

this lesswrong post is not a high quality post and if its its score is far from zero (positive or negative) in two days after posting I'll be sad. yudkowsky is digging a hole and just won't stop digging. I don't have a clue how to explain to him what the problem is if it's not obvious on the surface, so this is a call for input: can anyone explain why yudkowsky is being a fool in a way he'll understand?

-7

New Answer
New Comment

4 Answers sorted by

i think yudkowsky is trying to convey the fact that reality is the line on the right, not the line on the left:

see also my favorite part from AGI Ruin:

Trolley problems are not an interesting subproblem in all of this; if there are any survivors, you solved alignment. At this point, I no longer care how it works, I don't care how you got there, I am cause-agnostic about whatever methodology you used, all I am looking at is prospective results, all I want is that we have justifiable cause to believe of a pivotally useful AGI 'this will not kill literally everyone'. Anybody telling you I'm asking for stricter 'alignment' than this has failed at reading comprehension. The big ask from AGI alignment, the basic challenge I am saying is too difficult, is to obtain by any strategy whatsoever a significant chance of there being any survivors.

he sees some people say "oh no what if AI misalignment causes some people to die but not others" (typically "what if some group in control of the AI survives but everyone else dies or becomes subservient") and he's trying to get across the information that unaligned AI isn't selective, it does kill actually literally everyone, and if you have even just a few survivors you have already solved almost all of alignment and you're not far from being able to save actually everyone. (don't remember where he said that, but he definitely did say somewhere something along the lines of "if you did get a few survivors, you solved so much of the problem that your solution can be modified to get everyone to survive")

Strongly agreed with this model. (For others - I mentioned this next part to tammy/@carado and she edited the image a little to clarify already by adding the true/false and "how much of population" label, but still,) the image still seems like it contains the same problem I'm trying to figure out how to specify. Like, it is a claim in a model, and if the model is true, then this is simply an explanation of the truth. But someone who is highly uncertain about this claim, or even who currently has a lot of confidence pointing away from this claim, won't be m... (read more)

Fucking hell, the fact that something truthful was said, and LW downvoted it, really solidifies my impressions that the pessimists on LW are both wrong and irrational, which is worrying since LW has been dominated by pessimists for so long.

I agree, and bluntly I don't think Eliezer realizes how bad his epistemics are on AI safety.

I agree sort of. But, after reflection induced by people criticizing this post, I don't think this is the key thing I wanted to figure out. See my other comments from the past day here. (I've removed my strong upvote to change sort, which as a habitual partial-to-low-decoupler myself, I feel habit-pushed to mention isn't meant as a personal criticism)

Strong upvoted to counter some of the downvotes.

for what it's worth, I think the karma I would reflectively prefer this post to have is exactly zero, and it is in fact hovering around there. It's an important point argued awfully that I don't even know for sure I understand myself and I knew that when I posted it.

Note for posterity: THIS DOES NOT MEAN I THINK YUDKOWSKY'S VIEWS ARE MOSTLY WRONG. quite the opposite. he's mostly right, and yet he seems to be cursed with High Rate Of Misunderstanding and I'm not exactly sure why. He also (maybe this is the key point?) is doing stuff that is increasing rate of conflicton emission into the ai safety landscape, which seems to me to be on average decreasing rationality of the capabilities people who I'd want to influence.

15 comments, sorted by Click to highlight new comments since: Today at 3:33 AM

Thank you for inspiring me to write this!

😅 yup this isn't a good post, tried to clarify that I very much know that and am asking for help, not saying something I already know

But he's right?

to be clear I think he's not implementing good strategy in the face of the technical strategic landscape. some of the things he's suggesting that sound awful to me may be good strategy if he phrased them in less misleading ways. but he's doing the thing where you're only required to be honest with the high detail interpretation of your sentence but allowed to mislead structurally, which is a thing that shows up in public communications from high skill technical people who don't consider the vibes interpretation to be valid at all.

Being right and being good at convincing people you're right are not orthogonal, but they're closer than we'd like to think.

you say that, but this post is terrible for convincing people, and I knew it would be as I wrote it, hopefully that's quite obvious. I continue to not be sure what part of my brain's model of the way the world works as a whole is relevant to why I think his approach doesn't work, I don't even seem to be able to express a first order approximation. It just seems so obvious - which might mean I'm wrong, it might mean I deeply understand something to the point that I no longer know what the beginner explanation is, it might mean I'm bouncing off thinking about this in detail because the relevant models have an abort() in the paths I need to reason about this. actually, that last one sounds likely... hmm. eg, an approximate model that has validity guards so I don't crash my social thinking? I guess?

like, yud is going around pissing off the acc folks in unnecessary ways. I think it's possible to have better concentration of focus on what ways he irritates them - he's not going to stop irritating most of them while making his points, but. but. idk.

Part of the problem might be twitter. If you're on twitter, you are subject to the agency of the twitter recommender, which wants to upvote you when you say things that generate conflict. if you as a human do RL on twitter, you will be RL trained by the twitter algo to do ... <the bad thing he's doing>. but he did it long before twitter, too, it's just particularly important now.

See my post AI scares and changing public beliefs for one theory of exactly why what Yudkowsky is doing is a bad idea. I was of course primarily thinking of his approach when writing about polarization.

The other post I've been contemplating writing is "An unrecognized goddamn principle of fucking rational discourse: be fucking nice". Yudkowsky talks down to people. That's not nice, and it makes them emotionally want to prove him wrong instead of want to find ways to agree with him.

I should clarify that being right and convincing people are right are NOT orthogonal here on less wrong. If you can explain why you're sure you're right here, it will convince people you're right. Writing posts like this one is a way to draw people to a worthy project here.

I think you're right and I think talking about this here is the right way to make sure that's true and figure out what to collectively do about this issue.

No, he's not right at all. That extends to a lot of pessimists on AI, but he is not, in fact right, even if he uses strong language and is confident in an outcome.

Why is it not okay? Is it because he should be signaling more that he knows that most other people wouldn't justifiedly have enough confidence (yet) to make the same tradeoffs he's advocating for? I think it makes sense to advocate for making tradeoffs even if others wouldn't yet agree; convincing them would be much of the point of advocating.

he's burning respectability that those who are actually making progress on his worries need. he has catastrophically broken models of social communication and is saying sentences that don't mean the same thing when parsed even a little bit inaccurately. he is blaming others for misinterpreting him when he said something confusing. etc.

https://mobile.twitter.com/jachiam0/status/1641867859751239681

https://mobile.twitter.com/lovetheusers/status/1641989542092713987

in contrast, good safety communication:

https://mobile.twitter.com/soundboy/status/1641789276445630465

https://mobile.twitter.com/liron/status/1641928889072238592

https://mobile.twitter.com/anthrupad/status/1641997798131265536

Hm. You may be right. Maybe picking a few sentences or a paragraph or two from the TIME article or his tweets, and rewriting them, would help clarify.

Oh nice, yudkowsky's ted talk seems to not have the problem I was trying to figure out. hooray!

like it's not just from prosaic that I'm worried, I am also worried he's miscommunicating the pr of agent foundations too

like he has an important point and I feel like there's something in the way he's making it that is interfering with its reception

maybe including to me? tammy just pointed out that maybe he's making one of his old core points, you either align completely or align not at all; which, like, yep, agreed! how encode it better? what's going on with how he explains it that is making it break? Why didn't I know that instantly when writing this post? (not that that's an easy question for others, or even me now, since the me who wrote this post is in the past)

like, something is going on here and it feels like it's crashing a bunch of us. can someone who understands how these crashes work help debug

or maybe this post is useless to keep working on, because other posts have been made since that make much more direct and coherent contributions. If so, I'd appreciate a comment saying so explicitly; if I am convinced of that, I will 1. invert my own strong upvote into a strong downvote of OP, and 2. edit saying I no longer think it's reasonable for this post to have zero karma and people are quite welcome to downvote it.

New to LessWrong?