Strong +1s to many of the points here. Some things I'd highlight:
I think if he understood these ideas well enough to justify the confidence of his claims, then he wouldn't have found that as difficult.
But what makes you so confident that it's not possible for subject-matter experts to have correct intuitions that outpace their ability to articulate legible explanations to others?
Of course, it makes sense for other people who don't trust the (purported) expert to require an explanation, and not just take the (purported) expert's word for it. (So, I agree that fleshing out detailed examples is important for advancing our collective state of knowledge.) But the (purported) expert's own confidence should track correctness, not how easy it is to convince people using words.
But what makes you so confident that it's not possible for subject-matter experts to have correct intuitions that outpace their ability to articulate legible explanations to others?
Yepp, this is a judgement call. I don't have any hard and fast rules for how much you should expect experts' intuitions to plausibly outpace their ability to explain things. A few things which inform my opinion here:
I don't think Eliezer is doing particularly well on any of these criteria. In particular, the last one was why I pressed Eliezer to make predictions rather than postdictions in my debate with him. The extent to which Eliezer seemed confused that I cared about this was a noticeable update for me in the direction of believing that Eliezer's intuitions are less solid than he thinks.
It may be the case that Eliezer has strong object-level intuitions about the details of how intelligence works which he's not willing to share publicly, but which significantly increase his confidence in his public claims. If so, I think the onus is on him to highlight that so people can make a meta-level update on it.
I agree that intuitions might get you to high confidence without the ability to explain ideas legibly.
That said, I think expert intuitions still need to usually (always?) be grounded out in predictions about something (potentially including the many implicit predictions that are often required to do stuff). It seems to me like Eliezer is probably relying on a combination of:
Fantastic post! I agree with most of it, but I notice that Eliezer's post has a strong tone of "this is really actually important, the modal scenario is that we literally all die, people aren't taking this seriously and I need more help". More measured or academic writing, even when it agrees in principle, doesn't have the same tone or feeling of urgency. This has good effects (shaking people awake) and bad effects (panic/despair), but it's a critical difference and my guess is the effects are net positive right now.
I definitely agree that Eliezer's list of lethalities hits many rhetorical and pedagogical beats that other people are not hitting and I'm definitely not hitting. I also agree that it's worth having a sense of urgency given that there's a good chance of all of us dying (though quantitatively my risk of losing control of the universe though this channel is more like 20% than 99.99%, and I think extinction is a bit less less likely still).
I'm not totally sure about the net effects of the more extreme tone, I empathize with both the case in favor and the case against. Here I'm mostly just trying to contribute to the project of "get to the bottom of what's likely to happen and what should be done."
I did start the post with a list of 19 agreements with Eliezer, including many of the claims that are most relevant to the urgency, in part so that I wouldn't be misconstrued as arguing that everything is fine.
I really appreciate your including a number here, that's useful info. Would love to see more from everyone in the future - I know it takes more time/energy and operationalizations are hard, but I'd vastly prefer to see the easier versions over no versions or norms in favor of only writing up airtight probabilities.
(I also feel much better on an emotional level hearing 20% from you, I would've guessed anywhere between 30 and 90%. Others in the community may be similar: I've talked to multiple people who were pretty down after reading Eliezer's last few posts.)
The problem with Eliezer's recent posts (IMO) is not in how pessimistic they are, but in how they are actively insulting to the reader. EY might not realize that his writing is insulting, but in that case he should have an editor who just elides those insulting points. (And also s/Eliezer/I/g please.)
When "List of Lethalities" was posted, I privately wrote a list of where I disagreed with Eliezer, and I'm quite happy to see that there's a lot of convergence between my private list and Paul's list here.
I thought it would be a useful exercise to diff my list with Paul's; I'll record the result in the rest of this comment without the expectation that it's useful to anyone else.
Points on both lists:
When "List of Lethalities" was posted, I privately wrote a list of where I disagreed with Eliezer
Why privately?! Is there a phenomenon where other people feel concerned about the social reception of expressing disagreement until Paul does? This is a phenomenon common in many other fields - and I'd invoke it to explain how the 'tone' of talk about AI safety shifted so quickly once I came right out and was first to say everybody's dead - and if it's also happening on the other side then people need to start talking there too. Especially if people think they have solutions. They should talk.
It seems to me like you have a blind spot regarding how your position as a community leader functions. If you, very well respected high status rationalist, write a long, angry post dedicated to showing everyone else that they can't do original work and that their earnest attempts at solving the problem are, at best, ineffective & distracting and you're tired of having to personally go critique all of their action plans... They stop proposing action plans. They don't want to dilute the field with their "noise", and they don't want you and others to think they're stupid for not understanding why their actions are ineffective or not serious attempts in the first place. I don't care what you think you're saying - the primary operative takeaway for a large proportion of people, maybe everybody except recurring characters like Paul Christiano, is that even if their internal models say they have a solution, they should just shut up because they're not you and can't think correctly about these sorts of issues.
[Redacted rant/vent for being mean-spirited and unhelpful]
I don't care what you think you're saying - the primary operative takeaway for a large proportion of people, maybe everybody except recurring characters like Paul Christiano, is that even if their internal models say they have a solution, they should just shut up because they're not you and can't think correctly about these sorts of issues.
I think this is, unfortunately, true. One reason people might feel this way is because they view LessWrong posts through a social lens. Eliezer posts about how doomed alignment is and how stupid everyone else's solution attempts are, that feels bad, you feel sheepish about disagreeing, etc.
But despite understandably having this reaction to the social dynamics, the important part of the situation is not the social dynamics. It is about finding technical solutions to prevent utter ruination. When I notice the status-calculators in my brain starting to crunch and chew on Eliezer's posts, I tell them to be quiet, that's not important, who cares whether he thinks I'm a fool. I enter a frame in which Eliezer is a generator of claims and statements, and often those claims and statements are interesting and even true, so I do pay attention to...
Sounds like same way we had a dumb questions post we need somewhere explicitly for posting dumb potential solutions that will totally never work, or something, maybe?
I have now posted a "Half-baked AI safety ideas thread" (LW version, EA Forum version) - let me know if that's more or less what you had in mind.
I think it's unwise to internally label good-faith thinking as "dumb." If I did that, I feel that I would not be taking my own reasoning seriously. If I say a quick take, or an uninformed take, I can flag it as such. But "dumb potential solutions that will totally never work"? Not to my taste.
That said, if a person is only comfortable posting under the "dumb thoughts incoming" disclaimer—then perhaps that's the right move for them.
Saying that people should not care about social dynamics and only about object level arguments is a failure at world modelling. People do care about social dynamics, if you want to win, you need to take that into account. If you think that people should act differently, well, you are right, but the people who counts are the real one, not those who live in your head.
Incentives matters. In today's lesswrong, the threshold of quality for having your ideas heard (rather than everybody ganging up on you to explain how wrong you are) is much higher for people who disagree with Eliezer than for people who agree with him. Unsurprisingly, that means that people filter what they say at a higher rate if they disagree with Eliezer (or any other famous user honestly - including you.).
I wondered whether people would take away the message that "The social dynamics aren't important." I should have edited to clarify, so thanks for bringing this up.
Here was my intended message: The social dynamics are important, and it's important to not let yourself be bullied around, and it's important to make spaces where people aren't pressured into conformity. But I find it productive to approach this situation with a mindset of "OK, whatever, this Eliezer guy made these claims, who cares what he thinks of me, are his claims actually correct?" This tactic doesn't solve the social dynamics issues on LessWrong. This tactic just helps me think for myself.
So, to be clear, I agree that incentives matter, I agree that incentives are, in one way or another, bad around disagreeing with Eliezer (and, to lesser extents, with other prominent users). I infer that these bad incentives spring both from Eliezer's condescension and rudeness, and also a range of other failures.
For example, if many people aren't just doing their best to explain why they best-guess-of-the-facts agree with Eliezer—if those people are "ganging up" and rederiving the bottom line of "Eliezer has to be right"—th...
Seems to be sort of an inconsistent mental state to be thinking like that and writing up a bullet-point list of disagreements with me, and somebody not publishing the latter is, I'm worried, anticipating social pushback that isn't just from me.
somebody not publishing the latter is, I'm worried, anticipating social pushback that isn't just from me.
Respectfully, no shit Sherlock, that's what happens when a community leader establishes a norm of condescending to inquirers.
I feel much the same way as Citizen in that I want to understand the state of alignment and participate in conversations as a layperson. I too, have spent time pondering your model of reality to the detriment of my mental health. I will never post these questions and criticisms to LW because even if you yourself don't show up to hit me with the classic:
Answer by Eliezer YudkowskyApr 10, 2022 38
As a minor token of how much you're missing:
then someone else will, having learned from your example. The site culture has become noticeably more hostile in my opinion ever since Death with Dignity, and I lay that at least in part at your feet.
Yup, I've been disappointed with how unkindly Eliezer treats people sometimes. Bad example to set.
EDIT: Although I note your comment's first sentence is also hostile, which I think is also bad.
Let me make it clear that I'm not against venting, being angry, even saying to some people "dude, we're going to die", all that. Eliezer has put his whole life into this field and I don't think it's fair to say he shouldn't be angry from time to time. It's also not a good idea to pretend things are better than they actually are, and that includes regulating your emotional state to the point that you can't accurately convey things. But if the linchpin of LessWrong says that the field is being drowned by idiots pushing low-quality ideas (in so many words), then we shouldn't be surprised when even people who might have something to contribute decide to withhold those contributions, because they don't know whether or not they're the people doing the thing he's explicitly critiquing.
You (and probably I) are doing the same thing that you're criticizing Eliezer for. You're right, but don't do that. Be the change you wish to see in the world.
That sort of thinking is why we're where we are right now.
Be the change you wish to see in the world.
I have no idea how that cashes out game theoretically. There is a difference between moving from the mutual cooperation square to one of the exploitation squares, and moving from an exploitation square to mutual defection. The first defection is worse because it breaks the equilibrium, while the defection in response is a defensive play.
swarriner's post, including the tone, is True and Necessary.
Power makes you dumb, stay humble.
Tell everyone in the organization that safety is their responsibility, everyone's views are important.
Try to be accessible and not intimidating, admit that you make mistakes.
Schedule regular chats with underlings so they don't have to take initiative to flag potential problems. (If you think such chats aren't a good use of your time, another idea is to contract someone outside of the organization to do periodic informal safety chats. Chapter 9 is about how organizational outsiders are uniquely well-positioned to spot safety problems. Among other things, it seems workers are sometimes more willing to share concerns frankly with an outsider than they are with their boss.)
Accept that not all of the critical feedback you get will be good quality.
The book disrecommends anonymous surveys on the grounds that they communicate the subtext that sharing your views openly is unsafe. I think anonymous surveys might be a good idea in the EA community though -- retaliation against critics seems fairly common here (i.e. the culture of fear didn't come about by chance). Anyone who's been around here long enough will have figured out that shari...
I think it is very true that the pushback is not just from you, and that nothing you could do would drive it to zero, but also that different actions from you would lead to a lot less fear of bad reactions from both you and others.
To be honest, the fact that Eliezer is being his blunt unfiltered self is why I'd like to go to him first if he offered to evaluate my impact plan re AI. Because he's so obviously not optimising for professionalism, impressiveness, status, etc. he's deconfounding his signal and I'm much better able to evaluate what he's optimising for.[1] Hence why I'm much more confident that he's actually just optimising for roughly the thing I'm also optimising for. I don't trust anyone who isn't optimising purely to be able to look at my plan and think "oh ok, despite being a nobody this guy has some good ideas" if that were true.
And then there's the Graham's Design Paradox thing. I think I'm unusually good at optimising purely, and I don't think people who aren't around my level or above would be able to recognise that. Obviously, he's not the only one, but I've read his output the most, so I'm more confident that he's at least one of them.
Yes, perhaps a consequentialist would be instrumentally motivated to try to optimise more for these things, but the fact that Eliezer doesn't do that (as much) just makes it easier to understand and evaluate him.
Why privately?!
(Treating this as non-rhetorical, and making an effort here to say my true reasons rather than reasons which I endorse or which make me look good...)
In order of importance, starting from the most important:
OK, sure. First, I updated down on alignment difficulty after reading your lethalities post, because I had already baked in the expected-EY-quality doompost into my expectations. I was seriously relieved that you hadn't found any qualitatively new obstacles which might present deep challenges to my new view on alignment.
Here's one stab[1] at my disagreement with your list: Human beings exist, and our high-level reasoning about alignment has to account for the high-level alignment properties[2] of the only general intelligences we have ever found to exist ever. If ontological failure is such a nasty problem in AI alignment, how come very few people do terrible things because they forgot how to bind their "love" value to configurations of atoms? If it's really hard to get intelligences to care about reality, how does the genome do it millions of times each day?
Taking an item from your lethalities post:
...19... More generally, there is no known way to use the paradigm of loss functions, sensory inputs, and/or reward inputs, to optimize anything within a cognitive system to point at particular things within the environment - to point to latent events and objects
Yes, human beings exist and build world models beyond their local sensory data, and have values over those world models not just over the senses.
But this is not addressing all of the problem in Lethality 19. What's missing is how we point at something specific (not just at anything external).
The important disanalogy between AGI alignment and humans as already-existing (N)GIs is:
I addressed this distinction previously, in one of the links in OP. AFAIK we did not know how to reliably ensure the AI is pointed towards anything external, as long as it's external. But also, humans are reliably pointed to particular kinds of external things. See the linked thread for more detail.
The important disanalogy
I am not attempting to make an analogy. Genome->human values is, mechanistically, an instance of value formation within a generally intelligent mind. For all of our thought experiments, genome->human values is the only instance we have ever empirically observed.
for humans there is no principal - our values can be whatever
Huh? I think I misunderstand you. I perceive you as saying: "There is not a predictable mapping from whatever-is-in-the-genome+environmental-factors to learned-values."
If so, I strongly disagree. Like, in the world where that is true, wouldn't parents be extremely uncertain whether their children will care about hills or dogs or paperclips or door hinges? Our values are not "whatever", human values are generally formed over predictable kinds of real-world objects like dogs and people and tasty food.
...Or if you take evolution as the princ
I basically agree with you. I think you go too far in saying Lethailty 19 is solved, though. Using the 3 feats from your linked comment, which I'll summarise as "produce a mind that...":
(clearly each one is strictly harder than the previous) I recognise that Lethality 19 concerns feat 3, though it is worded as if being about both feat 2 and feat 3.
I think I need to distinguish two versions of feat 3:
Humans show that feat 2 at least has been accomplished, but also 3a, as I take you to be pointing out. I maintain that 3b is not demonstrated by humans and is probably something we need.
One reason you might do something like "writing up a list but not publishing it" is if you perceive yourself to be in a mostly-learning mode rather than a mostly-contributing one. You don't want to dilute the discussion with your thoughts that don't have a particularly good chance of adding anything, and you don't want to be written off as someone not worth listening to in a sticky way, but you want to write something down develop your understanding / check against future developments / record anything that might turn out to have value later after all once you understand better.
Of course, this isn't necessarily an optimal or good strategy, and people might still do it when it isn't - I've written down plenty of thoughts on alignment over the years, I think many of the actual-causal-reasons I'm a chronic lurker are pretty dumb and non-agentic - but I think people do reason like this, explicitly or implicitly.
There's a connection here to concernedcitizen64's point about your role as a community leader, inasmuch as your claims about the quality of the field can significantly influence people's probabilities that their ideas are useful / that they should be in a contributing mode, but IMO it's more generally about people's confidence in their contributions.
Overall I'd personally guess "all the usual reasons people don't publish their thoughts" over "fear of the reception of disagreement with high-status people" as the bigger factor here; I think the culture of LW is pretty good at conveying that high-quality criticism is appreciated.
(Partially in response to AGI Ruin: A list of Lethalities. Written in the same rambling style. Not exhaustive.)
Agreements
Disagreements
(Mostly stated without argument.)
and obsolete human contributions to alignmentretracted) well before they need to develop superhuman understanding of much of the world or tricks about how to think, and so even if they have a very different profile of abilities to humans they may still be subhuman in many important ways.My take on Eliezer's takes
Ten examples off the top of my head, that I think are about half overlapping and where I think the discussions in the ELK doc are if anything more thorough than the discussions in the list of lethalities: