LESSWRONG
LW

756
Richard_Ngo
20126Ω286617011310
Message
Dialogue
Subscribe

Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Twitter threads
Understanding systematization
Stories
Meta-rationality
Replacing fear
Shaping safer goals
AGI safety from first principles
Wei Dai's Shortform
Richard_Ngo8h40

Can you explain how someone who is virtuous, but missing the crucial consideration of "legible vs. illegible AI safety problems" can still benefit the world? I.e., why would they not be working on some highly legible safety problem that actually is negative EV to work on?

If a person is courageous enough to actually try to solve a problem (like AI safety), and high-integrity enough to avoid distorting their research due to social incentives (like incentives towards getting more citations), and honest enough to avoid self-deception about how to interpret their research, then I expect that they will tend towards doing "illegible" research even if they're not explicitly aware of the legible/illegible distinction. One basic mechanism is that they start pursuing lines of thinking that don't immediately make much sense to other people, and the more cutting-edge research they do the more their ontology will diverge from the mainstream ontology.

Reply
Wei Dai's Shortform
Richard_Ngo2d165

I'm taking the dialogue seriously but not literally. I don't think the actual phrases are anywhere near realistic. But the emotional tenor you capture of people doing safety-related work that they were told was very important, then feeling frustrated by arguments that it might actually be bad, seems pretty real. Mostly I think people in B's position stop dialoguing with people in A's position, though, because it's hard for them to continue while B resents A (especially because A often resents B too).

Some examples that feel like B-A pairs to me include: people interested in "ML safety" vs people interested in agent foundations (especially back around 2018-2022); people who support Anthropic vs people who don't; OpenPhil vs Habryka; and "mainstream" rationalists vs Vassar, Taylor, etc.

Reply1
Wei Dai's Shortform
Richard_Ngo3d3821

This observation should make us notice confusion about whether AI safety recruiting pipelines are actually doing the right type of thing.

In particular, the key problem here is that people are acting on a kind of top-down partly-social motivation (towards doing stuff that the AI safety community approves of)—a motivation which then behaves coercively towards their other motivations. But as per this dialogue, such a system is pretty fragile.

A healthier approach is to prioritize cultivating traits that are robustly good—e.g. virtue, emotional health, and fundamental knowledge. I expect that people with such traits will typically benefit the world even if they're missing crucial high-level considerations like the ones described above.

For example, an "AI capabilities" researcher from a decade ago who cared much more about fundamental knowledge than about citations might well have invented mechanistic interpretability without any thought of safety or alignment. Similarly, an AI capabilities researcher at OpenAI who was sufficiently high-integrity might have whistleblown on the non-disparagement agreements even if they didn't have any "safety-aligned" motivations.

Also, AI safety researchers who have those traits won't have an attitude of "What?! Ok, fine" or "WTF! Alright you win" towards people who convince them that they're failing to achieve their goals, but rather an attitude more like "thanks for helping me". (To be clear, I'm not encouraging people to directly try to adopt a "thanks for helping me" mentality, since that's liable to create suppressed resentment, but it's still a pointer to a kind of mentality that's possible for people with sufficiently little internal conflict.) And in the ideal case, they will notice that there's something broken about their process for choosing what to work on, and rethink that in a more fundamental way (which may well lead them to conclusions similar to mine above).

Reply
The Memetics of AI Successionism
Richard_Ngo10d69

In general, yes. But in this case the thing I wanted an example of was "a very distracting example", and the US left-right divide is a central example of a very distracting example.

Reply
The Memetics of AI Successionism
Richard_Ngo11d*8-8

Some agreements and disagreements:

  1. I think that memetic forces are extremely powerful and underrated. In particular, previous discussions of memetics have focused too much on individual memes rather than larger-scale memeplexes like AI successionism. I expect that there's a lot of important scientific thinking to be done about the dynamics of memeplexes.
  2. I think this post is probably a small step backwards for our collective understanding of large-scale memeplexes (and have downvoted accordingly) because it deeply entangles discussion of memetic forces in general with the specific memeplex of AI successionism. It's kinda like if Eliezer's original sequences had constantly referred back to Republicans as central examples of cognitive biases. (Indeed, he says he regrets even using religion so much as an example of cognitive bias.) It's also bad form to psychologize one's political opponents before actually responding to their object-level arguments. So I wish this had been three separate posts, one about the mechanics of memeplexes (neutral enough that both sides could agree with it), a second debunking AI successionism, and a third making claims about the memetics of AI successionism. Obviously that's significantly more work but I think that even roughly the same material would be better as three posts, or at least as one post with that three-part ordering.
  3. You might argue that this is justified because AI successionism is driven by unusually strong memetic forces. But I think you could write a pretty similar post with pretty similar arguments except replacing "AI accelerationism" with "AI safety". Indeed, you could think of this post as an example of the "AI safety" memeplex developing a new weapon (meta-level discussions of the memetic basis of the views of its opponents) to defeat its enemy, the "AI successionism" memeplex. Of course, AI accelerationists have been psychologizing safetyists for a while (and vice versa), so this is not an unprecedented weapon, but it's significantly more sophisticated than e.g. calling doomers neurotic.
  4. I'm guilty of a similar thing myself with this post, which introduces an important concept (consenses of power) from the frame of trying to understand wokeness. Doing so has made me noticeably more reluctant to send the post to people, because it'll probably bounce off them if they don't share my political views. I think if I'd been a better writer or thinker I would have made it much more neutral—if I were rewriting it today, for example, I'd structure it around discussions of both a left-wing consensus (wokeness) and a right-wing consensus (physical beauty).
Reply3111
Noah Birnbaum's Shortform
Richard_Ngo16d1313

You should probably link some posts, it's hard to discuss this so abstractly. And popular rationalist thinkers should be able to handle their posts being called mediocre (especially highly-upvoted ones).

Reply
Generalized Coming Out Of The Closet
Richard_Ngo17d20

I think there are other dynamics that are probably as important as 'renouncing antisocial desires' — in particular, something like 'blocks to perceiving aspects of vanilla sex/sexuality' (which can contribute to a desire for kink as nearest-unblocked-strategy)

This seems insightful and important!

Reply
21st Century Civilization curriculum
Richard_Ngo18d20

Fixed, ty!

Reply
21st Century Civilization curriculum
Richard_Ngo18d*125

Good question. I learned from my last curriculum (the AGI safety fundamentals one) that I should make my curricula harder than I instinctively want to. So I included a bunch of readings that I personally took a long time to appreciate as much as I do now (e.g. Hoffman on the debtor's revolt, Yudkowsky on local validity, Sotala on beliefs as emotional strategies, Moses on The Germans in week 1). Overall I think there's at least one reading per week that would reward very deep thought. Also I'm very near (and plausibly literally on) the global Pareto frontier in how much I appreciate all of MAGA-type politics, rationalist-type analysis, and hippie-type discussion of trauma, embodied emotions, etc. I've tried to include enough of all of these in there that very few people will consistently think "okay, I get it".

Having said that, people kept recommending that I include books, and I kept telling them I couldn't because I only want to give people 20k words max of main readings per week. Given a word budget it seems like people will learn more from reading many short essays than a few books. But maybe that's an artifact of how I personally think (basically, I like to start as broad as possible and then triangulate my way down to specific truths), whereas other people might get more out of going deeper into fewer topics.

I do think that there's not enough depth to be really persuasive to people who go in strongly disagreeing with me on some/all of these topics. My hope is that I can at least convey that there's some shape of coherent worldview here, which people will find valuable to engage with even if they don't buy it wholesale.

Reply11
21st Century Civilization curriculum
Richard_Ngo18d*123

The threats of losing one’s job or getting evicted are not actually very scary when you’re in healthy labor and property markets. And we’ve produced so much technological abundance over the last century that our labor and property markets should be flourishing. So insofar as those things are still scary for people today, a deeper explanation for that comes in explaining why our labor and property markets arent very healthy, which comes back to our inability to build and our overrregulation.

But also: yes, there’s a bunch of stuff in this curriculum about exploitation by elites. Somehow there’s a strange pattern though where a lot of the elite exploitation is extremely negative-sum: e.g. so so much money is burned in the US healthcare system, not even transferred to elites (e.g. there are many ways in which being a doctor is miserable which you would expect a healthy system to get rid of). So I focused on paradigm examples of negative-sum problems in the intro to highlight that’s there’s definitely something very Pareto suboptimal going on here. 

Reply
Load More
6Richard Ngo's Shortform
Ω
6y
Ω
457
3821st Century Civilization curriculum
18d
10
166Underdog bias rules everything around me
3mo
53
61On Pessimization
3mo
3
64Applying right-wing frames to AGI (geo)politics
4mo
25
35Well-foundedness as an organizing principle of healthy minds and societies
7mo
7
99Third-wave AI safety needs sociopolitical thinking
7mo
23
96Towards a scale-free theory of intelligent agency
Ω
8mo
Ω
46
92Elite Coordination via the Consensus of Power
8mo
15
253Trojan Sky
8mo
39
214Power Lies Trembling: a three-book review
7mo
29
Load More