LESSWRONG
LW

70
Wei Dai
43580Ω3172148519618
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
The Charge of the Hobby Horse
Wei Dai9h40

I'm also surprised and a bit disappointed that it got so many upvotes.

I explained what probably caused this here. I think the current "Popular Comments" feature might often cause this kind of decontextualized voting, and there should perhaps be a way to mitigate it, like let the author of the post or of the comment remove a comment from Popular Comments.

Reply
The Charge of the Hobby Horse
Wei Dai9h40

Bowing out for now because I strongly suspect Tsvi has been strongly downvoting all or most of my comments in this thread. Maybe will pick it up later, in a different venue.

Reply
Please, Don't Roll Your Own Metaethics
Wei Dai9hΩ5180

You may have missed my footnote, where I addressed this?

To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.

Reply
Please, Don't Roll Your Own Metaethics
Wei Dai1dΩ340

By "metaethics," do you mean something like "a theory of how humans should think about their values"?

I feel like I've seen that kind of usage on LW a bunch, but it's atypical. In philosophy, "metaethics" has a thinner, less ambitious interpretation of answering something like, "What even are values, are they stance-independent, yes/no?"

By "metaethics" I mean "the nature of values/morality", which I think is how it's used in academic philosophy. Of course the nature of values/morality has a strong influence on "how humans should think about their values" so these are pretty closely connected, but definitionally I do try to use it the same way as in philosophy, to minimize confusion. This post can give you a better idea of how I typically use it. (But as you'll see below, this is actually not crucial for understanding my post.)

Anyway, I'm asking about this because I found the following paragraph hard to understand:

So in the paragraph that you quoted (and the rest of the post), I was actually talking about philosophical fields/ideas in general, not just metaethics. While my title has "metaethics" in it, the text of the post talks generically about any "philosophical questions" that are relevant for AI x-safety. If we substitute metaethics (in my or the academic sense) into my post, then you can derive that I mean something like this:

Different metaethics (ideas/theories about the nature of values/morality) have different implications for what AI designs or alignment approaches are safe, and if you design an AI assuming that one metaethical theory is true, it could be disastrous if a different metaethical theory actually turns out to be true.

For example, if moral realism is true, then aligning the AI to human values would be pointless. What you really need to do is design the AI to be able to determine and follow objective moral truths. But this approach would be disastrous if moral realism is actually false. Similarly, if moral noncognitivism is true, that means that humans can't be wrong about their values, and implies "how humans should think about their values" is of no importance. If you design AI under this assumption, that would be disastrous if actually humans can be wrong about their values and they really need AIs to help them think about their values and avoid moral errors.

I think in practice a lot of alignment researchers may not even have explicit metaethical theories in mind, but are implicitly making certain metaethical assumptions in their AI design or alignment approach. For example they may largely ignore the question of how humans should think about their values or how AIs should help humans think about their values, thus essentially baking in an assumption of noncognitivism.

You're conceding that morality/values might be (to some degree) subjective, but you're cautioning people from having strong views about "metaethics," which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we'd want for ourselves and others.

If we substitute "how humans/AIs should reason about values" (which I'm not sure has a name in academic philosophy but I think does fall under metaphilosophy, which covers all philosophical reasoning) into the post, then your conclusion here falls out, so yes, it's also a valid interpretation of what I'm trying to convey.

I hope that makes everything a bit clearer?

Reply1
"But You'd Like To Feel Companionate Love, Right? ... Right?"
Wei Dai1d70

Conditional on True Convergent Goodness being a thing, companionate love would not be one of my top candidates for being part of it, as it seems too parochial to (a subset of) humans. My current top candidate would be something like "maximization of hedonic experiences" with a lot of uncertainty around:

  1. Problems with consciousness/qualia.
  2. How to measure/define/compare how hedonic an experience is?
  3. Selfish vs altruistic, and a lot of subproblems around these, including identity and population ethics
  4. Does it need to be real in some sense (e.g., does being in an Experience Machine satisfy True Convergent Goodness)?
  5. Does there need to be diversity/variety or is it best to tile the universe with the same maxed out hedonic experience? (I guess if variety is part of True Convergent Goodness, then companionate love may make it in after all, indirectly.)

Other top candidates include negative or negative-leaning utilitarianism, and preference utilitarianism (although this is a distant 3rd). And a lot of credence on "something we haven't thought of yet."

Reply11
Problems I've Tried to Legibilize
Wei Dai1dΩ228

A lab leader who’s concerned enough to slow down will be pressured by investors to speed back up, or get replaced, or get outcompeted. Really you need to convince the whole lab and its investors. And you need to be more convincing than the magic of the market!

This seems to imply that lab leaders would be easier to convince if there were no investors and no markets, in other words if they had more concentrated power.

If you spread out the power of AI more, won't all those decentralized nodes of spread out AI power still have to compete with each other in markets? If market pressures are the core problem, how does decentralization solve that?

I'm concerned that your proposed solution attacks "concentration of power" when the real problem you've identified is more like market dynamics. If so, it could fail to solve the problem or make it even worse.

My own perspective is that markets are a definite problem, and concentration of power per se is more ambiguous (I'm not sure if it's good or bad). To solve AI x-safety we basically have to bypass or override markets somehow, e.g., through international agreements and government regulations/bans.

Reply
The Charge of the Hobby Horse
Wei Dai1d4-3

A difference is that Tsvi is still plenty motivated to talk on a meta level (about why he banned me), as evidenced by this post. So he could have easily said "I no longer want to talk about the object level. I think you're doing a bad thing, [explanation ...], please change your behavior if you agree, or let me know why you don't (on the meta level)." Or "I'm writing up an explanation of what you're doing wrong in this thread. Let's pause this discussion until I finish it."

Or if he actually doesn't want to talk at all, he could have said "I'm getting really annoyed so I'm disengaging." or "I think you're doing a bad thing here, here's a short explanation but I don't want to discuss it further. Please stop it or I'll ban you."

Note that I'm not endorsing banning or threat of banning in an absolute sense, just suggesting that all of these are more "pro-social" than banning someone out of the blue with no warning. None of these involve asking him to "suck it up and keep talking to me" or otherwise impose a large cost on him.

Reply
Wei Dai's Shortform
Wei Dai1d20

Need: A way to load all comments and posts of a user. Right now it only loads the top N by karma.

Want: A "download" button, for some users who have up to hundreds of MB of content, too unwieldy to copy/paste. Ability to collate/sort in various ways, especially as flat list of mixed posts and comments, sorted by posting date from oldest to newest.

Reply
Load More
10Wei Dai's Shortform
Ω
2y
Ω
311
146Please, Don't Roll Your Own Metaethics
Ω
4d
Ω
57
119Problems I've Tried to Legibilize
Ω
7d
Ω
19
346Legible vs. Illegible AI Safety Problems
Ω
7d
Ω
93
71Trying to understand my own cognitive edge
13d
17
10Wei Dai's Shortform
Ω
2y
Ω
311
66Managing risks while trying to do good
2y
28
49AI doing philosophy = AI generating hands?
Ω
2y
Ω
24
228UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More