LESSWRONG
LW

413
Wei Dai
43075Ω3090147516618
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
296
114Problems I've Tried to Legibilize
Ω
3d
Ω
16
315Legible vs. Illegible AI Safety Problems
Ω
3d
Ω
92
65Trying to understand my own cognitive edge
9d
17
10Wei Dai's Shortform
Ω
2y
Ω
296
66Managing risks while trying to do good
2y
28
49AI doing philosophy = AI generating hands?
Ω
2y
Ω
24
228UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
55How to bet against civilizational adequacy?
Q
3y
Q
20
Load More
The problem of graceful deference
Wei Dai10m20

suggesting that other readers found Dai's engagement valuable

This may not be a valid inference, or your update may be too strong, given that my comment got a strong upvote early or immediately, which caused it to land in the Popular Comments section of the front page, where others may have further upvoted it in a decontextualized way.

It looks like I'm not actually banned yet, but will disengage for now to respect Tsvi's wishes/feelings. Thought I should correct the record on the above first, as I'm probably the only person who could (due to seeing the strong upvote and the resulting position in Popular Comments).

Reply
The problem of graceful deference
Wei Dai6h53

Since I've described how deferring is really bad in several other places, here in THIS post I'm asking, given that we're going to defer despite its costs, and given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?

Ok, it looks like part of my motivation for going down this line of thought was based on a misunderstanding. But to be fair, in this post after you asked "What should we have done instead?" with regard to deferring to Eliezer, you didn't clearly say "we should have not deferred or deferred less", but instead wrote "We don't have to stop deferring, to avoid this correlated failure. We just have to say that we're deferring." Given that this is a case where many people could have and should have not deferred, this just seems like a bad example to illustrate "given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?", leading to the kind of confusion I had.

Also, another part of my motivation is still valid and I think it would be interesting to try to answer why didn't you (and others) just not defer? Not in a rhetorical sense, but what actually caused this? Was it age as you hinted earlier? Was it just human nature to want to defer to someone? Was it that you were being paid by an organization that Eliezer founded and had very strong influence over? Etc.? And also why didn't you (and others) notice Eliezer's strategic mistakes, if that has a different or additional answer?

Reply
The problem of graceful deference
Wei Dai8h73

By saying that he was the best strategic thinker, it seems like you're trying to justify deferring to him on strategy (why not do that if he is actually the best), while also trying to figure out how to defer "gracefully", whereas I'm questioning whether it made sense to defer to him at all, when you could have taken into account his (and other people's) writings about strategic background, and then looked for other important considerations and formed your own judgments.

Another thing that interests me is that several of his high-level strategic judgments seemed wrong or questionable to me at the time (as listed in my OP, and I can look up my old posts/comments if that would help), and if it didn't seem that way to others, I want to understand why. Was Eliezer actually right, given what we knew at the time? Did it require a rare strategic mind to notice his mistakes? Or was it a halo effect, or the effect of Eliezer writing too confidently, or something else, that caused others to have a cognitive blind spot about this?

Reply1
The problem of graceful deference
Wei Dai9h76

a huge amount of strategic background; as a consequence of being good strategic background, they shifted many people to working on this"

Maybe we should distinguish between being good at thinking about / explaining strategic background, versus being actually good at strategy per se, e.g. picking high-level directions or judging overall approaches? I think he's good at the former, but people mistakenly deferred to him too much on the latter.

It would make sense that one could be good at one of these and less good at the other, as they require somewhat different skills. In particular I think the former does not require one to be able to think of all of the crucial considerations, or have overall good judgment after taking them all into consideration.

No? They're all really difficult questions. Even being an expert in one of these would be at least a career. I mean, maybe YOU can, but I can't, and I definitely can't do so when I'm just a kid starting to think about how to help with X-derisking.

So Eliezer could become experts in all of them starting from scratch, but you couldn't even though you could build upon his writings and other people's? What was/is your theory of why he is so much above you in this regard? ("Being a kid" seems a red herring since Eliezer was pretty young when he did much of his strategic thinking.)

Reply
The problem of graceful deference
Wei Dai10h*277

Yudkowsky, being the best strategic thinker on the topic of existential risk from AGI

This seems strange to say, given that he:

  1. decided to aim for "technological victory", without acknowledging or being sufficiently concerned that it would inspire others to do the same
  2. decided it's feasible to win the AI race with a small team and while burdened by Friendliness/alignment/x-safety concerns
  3. overestimated likely pace of progress relative to difficulty of problems, even on narrow problems that he personally focused on like decision theory (still far from solved today, ~16 years later. Edit: see UDT shows that decision theory is more puzzling than ever)
  4. had large responsibility for others being overly deferential to him by writing/talking in a highly confident style, and not explicitly pushing back on the over-deference
  5. is still overly focused on one particular AI x-risk (takeover due to misalignment) and underemphasizing or ignoring many other disjunctive risks

These seemed like obvious mistakes even at the time (I wrote posts/comments arguing against them), so I feel like the over-deference to Eliezer is a completely different phenomenon from "But you can’t become a simultaneous expert on most of the questions that you care about." or has very different causes. In other words, if you were going to spend your career on AI x-safety, of course you could have become an expert on these questions first.

Reply11
Human Values ≠ Goodness
Wei Dai12h*70

I've now read your linked posts, but can't derive from them how you would answer my questions. Do you want to take a direct shot at answering them? And also the following question/counter-argument?

Think about the consequences, what will actually happen down the line and how well your Values will actually be satisfied long-term, not just about what feels yummy in the moment.

Suppose I'm a sadist who derives a lot of pleasure/reward from torturing animals, but also my parents and everyone else in society taught me that torturing animals is wrong. According to your posts, this implies that my Values = "torturing animals has high value", and Goodness = "don't torturing animals", and I shouldn't follow Goodness unless it actually lets me better satisfy my values better long-term, in other words allows me to torture more animals in the long run. Am I understanding your ideas correctly?

(Edit: It looks like @Johannes C. Mayer made a similar point under one of your previous posts.)

Assuming I am understanding you correctly, this would be a controversial position to say the least, and counter to many people's intuitions or metaethical beliefs. I think metaethics is a hard problem, and I probably can't easily convince you that you're wrong. But maybe I can at least convince you that you shouldn't be as confident in these ideas as you appear to be, nor present them to "lower-level readers" without indicating how controversial / counterintuitive-to-many the implications of your ideas are.

Reply
Wei Dai's Shortform
Wei Dai1dΩ460

An update on this 2010 position of mine, which seems to have become conventional wisdom on LW:

In my posts, I've argued that indexical uncertainty like this shouldn't be represented using probabilities. Instead, I suggest that you consider yourself to be all of the many copies of you, i.e., both the ones in the ancestor simulations and the one in 2010, making decisions for all of them. Depending on your preferences, you might consider the consequences of the decisions of the copy in 2010 to be the most important and far-reaching, and therefore act mostly as if that was the only copy. [Emphasis added]

In the subsequent 15 years, I've upweighted influencing the multiverse through my copies in simulations, relative to base universes, to where they're about equally important in my mind. For 4 reasons:

  1. Realizing that there are probably universes with vastly greater computational resources than ours, implying there are more simulations containing me than I had thought.
  2. Realizing that there may well be many possible motivations for running simulations, besides interest in ancestors, which was the main motivation talked about back in 2010. This includes acausal trade, curiosity about aliens, science on the distribution of superintelligences in the multiverse, gathering philosophical ideas, entertainment, altruism.
  3. Being fortunate in various ways, on a personal level, that seem more likely if I'm in a simulation.
  4. Doing badly on a civilization level, in a way that devalues trying to influence base reality. I.e, relative to 2010, it looks more likely we'll undergo the AI transition blindly, without key decision makers paying attention to important philosophical and strategic ideas.

Concretely, this mostly cashes out to me thinking and writing with potential simulators in mind as an additional audience, hoping my ideas might benefit or interest some of them even if they end up largely ignored in this reality.

Reply
Problems I've Tried to Legibilize
Wei Dai2dΩ330

normally when I think about this problem I resolve it as "what narrow capabilities can we build that are helpful 'to the workflow' of people solving illegible problems, that aren't particularly bad from a capabilities standpoint".

Do you have any writings about this, e.g., examples of what this line of thought led to?

Reply
Problems I've Tried to Legibilize
Wei Dai2dΩ341

I agree this is a major risk. (Another one is that it's just infeasible to significantly increase AI philosophical competence in the relevant time frame. Another one is that it's much easier to make it appear like the AI is more philosophically competent, giving us false security.) So I continue to think that pausing/stopping AI should be plan A (which legibilizing the problem of AI philosophical competence can contribute to), with actually improving AI philosophical competence as (part of) plan B. Having said that, 2 reasons this risk might not bear out:

  1. Empirically the best capabilities people (e.g., STEM/finance workers, managers, politicians) tent to be distinct from the best philosophers. And there are whole cultures (e.g. China) getting very good at STEM but still far behind at making philosophical progress.
    1. But the opportunity cost of learning additional skills for AIs appears much lower than for humans, so this pattern might not carry forward to future AIs.
  2. If I'm right about "philosophy reasoning" being some kind of (currently opaque) general but slow problem solving method, and we already have more legible, specialized, and faster methods for specific areas, such as math, science, and engineering, with "philosophical problems" being left-over problems that lack such faster methods, then making AIs better at philosophical reasoning ought to help with philosophical problems more than other types of problems.
    1. But philosophical reasoning can still help with "non-philosophical" problems, if those problems have some parts that are "more philosophical" that can be sped up by applying good philosophical reasoning. 

To conclude I'm quite worried about the risks/downsides of trying to increase AI philosophical competence, but it seems to a problem that has to be solved eventually. "The only way out is through" but we can certainly choose to do it at a more opportune time, when humans are much smarter on average and have made a lot more progress in metaphilosophy (understanding the nature of philosophy and philosophical reasoning).

Reply
Problems I've Tried to Legibilize
Wei Dai2dΩ442

even on alignment

I see a disagreement vote on this, but I think it does make sense. Alignment work at the AI labs will almost by definition be work on legible problems, but we should make exceptions for people who can give reasons for why their work is not legible (or otherwise still positive EV), or who are trying to make illegible problems more legible for others at the labs.

Think more seriously about building organizations that will make AI power more spread out.

I start to disagree from here, as this approach would make almost all of the items on my list worse, and I'm not sure which ones it would make better. You started this thread by say "Even if we solved metaethics and metaphilosophy tomorrow, and gave them the solution on a plate, they wouldn't take it." which I'm definitely very worried about, but how does making AI power more spread out help with this? Is the average human (or humanity collectively) more likely to be concerned about metaethics and metaphilosophy than a typical AI lab leader, or easier to make concerned? I think the opposite is more likely to be true?

Reply
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More