LESSWRONG
LW

1391
Wei Dai
43209Ω3119148517518
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Human Values ≠ Goodness
Wei Dai2h20

One way you could apply it is by not endorsing so completely/confidently the kind of "rolling your own metaethics" that I argued against (that I see John as doing here), i.e., by saying "the distinction John is making here is correct, plus his advice on how to approach it." (Of course you wrote that before I posted, but I'm hoping this is one take-way people get from my post.)

Reply
Human Values ≠ Goodness
Wei Dai2h20

Have you also seen https://www.lesswrong.com/posts/KCSmZsQzwvBxYNNaT/please-don-t-roll-your-own-metaethics which was also partly in response to that thread? BTW why is my post still in "personal blog"?

Reply
Human Values ≠ Goodness
Wei Dai3h40

the distinction John is making here is correct, plus his advice on how to approach it

Really? Did you see this comment of mine? Do you endorse John's reply to it (specifically the part about the sadist)?

Reply
Please, Don't Roll Your Own Metaethics
Wei Dai3hΩ220

I hinted at it with "prior efforts/history", but to spell it out more, metaethics seems to have a lot more effort gone into it in the past, so there's less likely to be some kind of low hanging fruit in idea space, that once picked, everyone will agree is the right solution.

Reply
Please, Don't Roll Your Own Metaethics
Wei Dai5hΩ231

The problem is that we can't. The closest thing we have is instead a collection of mutually exclusive ideas where at most one (possibly none) is correct, and we have no consensus as to which.

Reply
Human Values ≠ Goodness
Wei Dai7h20

Maybe something like "This post presents a simplified version of my ideas, intended as an introduction. For more details and advanced considerations, please see such and such posts."

Reply
Please, Don't Roll Your Own Metaethics
Wei Dai7hΩ220

#2 feels like it's injecting some frame that's a bit weird to inject here (don't roll your own metaethics... but rolling your own metaphilosophy is okay?)

Maybe you missed my footnote?

To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.

and/or this part of my answer (emphasis added):

Try to solve metaphilosophy, where potentially someone could make a breakthrough that everyone can agree is correct (after extensive review)

 

But also, I'm suddenly confused about who this post is trying to warn. Is it more like labs, or more like EA-ish people doing a wider variety of meta-work?

I think I mostly had alignment researchers (in and out of labs) as the target audience in mind, but it does seem relevant to others so perhaps I should expand the target audience?

Reply
Please, Don't Roll Your Own Metaethics
Wei Dai8h64

The analogy is that in both fields people are by default very prone to being overconfident. In cryptography this can be seen by the phenomenon of people (especially newcomers who haven't learned the lesson) confidently proposing new cryptographic algorithms, which end up being way easier to break than they expect. In philosophy this is a bit trickier to demonstrate, but I think can be seen via a combination of:

  1. people confidently holding positions that are incompatible with other people's confident positions
  2. tendency to "bite bullets" or accepting implications that are highly counterintuitive to others or even to themselves, instead of adopting more uncertainty
  3. the total idea/argument space being exponentially vast and underexplored due to human limitations, therefore high confidence being unjustified in light of this
Reply11
Please, Don't Roll Your Own Metaethics
Wei Dai8hΩ41412

"More research needed" but here are some ideas to start with:

  1. Try to design alignment/safety schemes that are agnostic or don't depend on controversial philosophical ideas. For certain areas that seem highly relevant and where there could potentially be hidden dependencies (such as metaethics), explicitly understand and explain why, under each plausible position that people currently hold, the alignment/safety scheme will result in a good or ok outcome. (E.g., why it leads to a good outcome regardless of whether moral realism or anti-realism is true, or any one of the other positions.)
  2. Try to solve metaphilosophy, where potentially someone could make a breakthrough that everyone can agree is correct (after extensive review), which can then be used to speed up progress in all other philosophical fields. (This could also happen in another philosophical field, but seems a lot less likely due to prior efforts/history. I don't think it's very likely in metaphilosophy either, but perhaps worth a try, for those who may have very strong comparative advantage in this.)
  3. If 1 and 2 look hard or impossible, make this clear to non-experts (your boss, company leaders/board, government officials, the public), don't let them accept a "roll your own metaethics" solution, or a solution with implicit/hidden philosophical assumptions.
  4. Support AI pause/stop.
Reply
The problem of graceful deference
Wei Dai15h60

suggesting that other readers found Dai's engagement valuable

This may not be a valid inference, or your update may be too strong, given that my comment got a strong upvote early or immediately, which caused it to land in the Popular Comments section of the front page, where others may have further upvoted it in a decontextualized way.

It looks like I'm not actually banned yet, but will disengage for now to respect Tsvi's wishes/feelings. Thought I should correct the record on the above first, as I'm probably the only person who could (due to seeing the strong upvote and the resulting position in Popular Comments).

Reply
Load More
10Wei Dai's Shortform
Ω
2y
Ω
296
77Please, Don't Roll Your Own Metaethics
Ω
11h
Ω
14
114Problems I've Tried to Legibilize
Ω
4d
Ω
16
317Legible vs. Illegible AI Safety Problems
Ω
4d
Ω
92
71Trying to understand my own cognitive edge
10d
17
10Wei Dai's Shortform
Ω
2y
Ω
296
66Managing risks while trying to do good
2y
28
49AI doing philosophy = AI generating hands?
Ω
2y
Ω
24
228UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More