LESSWRONG
LW

40
Wei Dai
41902Ω2887144510018
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995).
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Wei Dai's Shortform
Wei Dai1h20

That fully boils down to whether the experience includes a preference to be dead (or to have not been born).

I'm pretty doubtful about this. It seems totally possible that evolution gave us a desire to be alive, while also gave us a net welfare that's negative. I mean we're deluded by default about a lot of other things (e.g., think there are agents/gods everywhere in nature, don't recognize that social status is a hugely important motivation behind everything we do), why not this too?

Reply
Wei Dai's Shortform
Wei Dai2h40

Let’s take an area where you have something to say, like philosophy. Would you be willing to outsource that?

Outsourcing philosophy is the main thing I've been trying to do, or trying to figure out how to safely do, for decades at this point. I've written about it in various places, including this post and my pinned tweet on X. Quoting from the latter:

Among my first reactions upon hearing "artificial superintelligence" were "I can finally get answers to my favorite philosophical problems" followed by "How do I make sure the ASI actually answers them correctly?"

Aside from wanting to outsource philosophy to ASI, I'd also love to have more humans who could answer these questions for me. I think about this a fair bit and wrote some things down but don't have any magic bullets.

(I currently think the best bet to eventually getting what I want is to encourage an AI pause along with genetic enhancements for human intelligence, have the enhanced humans solve metaphilosophy and other aspects of AI safety, then outsource the rest of philosophy to ASI, or have the enhanced humans decide what to do at that point.)

BTW I thought this would be a good test for how competent current AIs are at understanding someone's perspective so I asked a bunch of them how Wei Dai would answer your question, and all of them got it wrong on the first try, except Claude Sonnet 4.5 which got it right on the first try but wrong on the second try. It seems like having my public content in their training data isn't enough, and finding relevant info from the web and understanding nuance are still challenging for them. (GPT-5 essentially said I'd answer no because I wouldn't trust current AIs enough, which is really missing the point despite having this whole thread as context.)

Reply
Wei Dai's Shortform
Wei Dai12h20

By negative value I mean negative utility, or an experience that's worse than a neutral or null experience.

Reply
Wei Dai's Shortform
Wei Dai13h40

How do you come up with an encoding that covers all possible experiences? How do you determine which experiences have positive and negative values (and their amplitudes)? What to do about the degrees of freedom in choosing the Turing machine and encoding schemes, which can be handwaved away in some applications of AIT but not here I think?

Reply
Wei Dai's Shortform
Wei Dai17h20

Well, there's no point in asking the AI to make me good at things if I'm the kind of person who will just keep asking the AI to do more things for me!

But I'm only asking the AI to do things for me because they're too effortful or costly. If the AI made me good at these things with no extra effort or cost (versus asking the AI to do it) then why wouldn't I do them myself? For example I'm pretty sure I'd love the experience of playing like a concert pianist, and would ask for this ability, if doing so involved minimal effort and cost.

On the practical side, I agree that atrophy and being addicted/exploited are risks/costs worth keeping in mind, but I've generally made tradeoffs more in the direction of using shortcuts to minimize "doing chores" (e.g., buying a GPS for my car as soon as they came out, giving up learning an instrument very early) and haven't regretted it so far.

Reply
Wei Dai's Shortform
Wei Dai2d21

If my value system is only about receiving stuff from the universe, then the logical endpoint is a kind of blob that just receives stuff and doesn't even need a brain.

Unless one of the things you want to receive from the universe is to be like Leonardo da Vinci, or be able to do everything effortlessly and with extreme competence. Why "do chores" now if you can get to that endpoint either way, or maybe even more likely if you don't "do chores" because it allows you to save on opportunity costs and better deploy your comparative advantage? (I can understand if you enjoy the time spent doing these activities, but by calling them "chores" you seem to be implying that you don't?)

Reply
Wei Dai's Shortform
Wei Dai2d20

Hmm, I find it hard to understand or appreciate this attitude. I can't think of any chores that I intrinsically don't want to outsource, only concerns that I may not be able to trust the results. What are some other examples of chores you do and don't want to outsource? Do you have any pattern or explanation of where you draw the line? Do you think people who don't mind outsourcing all their chores are wrong in some way?

Reply
Wei Dai's Shortform
Wei Dai2d8867

A clear mistake of early AI safety people is not emphasizing enough (or ignoring) the possibility that solving AI alignment (as a set of technical/philosophical problems) may not be feasible in the relevant time-frame, without a long AI pause. Some have subsequently changed their minds about pausing AI, but by not reflecting on and publicly acknowledging their initial mistakes, I think they are or will be partly responsible for others repeating similar mistakes.

Case in point is Will MacAskill's recent Effective altruism in the age of AGI. Here's my reply, copied from EA Forum:

I think it's likely that without a long (e.g. multi-decade) AI pause, one or more of these "non-takeover AI risks" can't be solved or reduced to an acceptable level. To be more specific:

  1. Solving AI welfare may depend on having a good understanding of consciousness, which is a notoriously hard philosophical problem.
  2. Concentration of power may be structurally favored by the nature of AGI or post-AGI economics, and defy any good solutions.
  3. Defending against AI-powered persuasion/manipulation may require solving metaphilosophy, which judging from other comparable fields, like meta-ethics and philosophy of math, may take at least multiple decades to do.

I'm worried that by creating (or redirecting) a movement to solve these problems, without noting at an early stage that these problems may not be solvable in a relevant time-frame (without a long AI pause), it will feed into a human tendency to be overconfident about one's own ideas and solutions, and create a group of people whose identities, livelihoods, and social status are tied up with having (what they think are) good solutions or approaches to these problems, ultimately making it harder in the future to build consensus about the desirability of pausing AI development.

Reply
Wei Dai's Shortform
Wei Dai2d20

it'll be even harder if I know the other person is responding to an AI-rewritten version of my comment, referring to an AI-summarized version of my profile, running AI hypotheticals on how I would react

I think all of these are better than the likely alternatives though, which is that

  • I fail to understand someone's comment or the reasoning/motivations behind their words, and most likely just move on (instead of asking them to clarify)
  • I have little idea what their background knowledge/beliefs are when replying to them
  • I fail to consider some people's perspectives on some issue

It also seems like I change my mind (or at least become somewhat more sympathetic) more easily when arguing with an AI-representation of someone's perspective, maybe due to less perceived incentive to prove that I was right all along.

Reply
Wei Dai's Shortform
Wei Dai7d52

If people started trying earnestly to convert wealth/income into more kids, we'd come under Malthusian constraints again, and before that much backsliding in living standards and downward social mobility for most people, which would trigger a lot of cultural upheaval and potential backlash (e.g., calls for more welfare/redistribution and attempts to turn culture back against "eugenics"/"social Darwinism", which will probably succeed just like they succeeded before). It seems ethically pretty fraught to try to push the world in that direction, to say the least, and it has a lot of other downsides, so I think at this point a much better plan to increase human intelligence is to make available genetic enhancements that parents can voluntarily choose for their kids, government-subsidized if necessary to make them affordable for everyone, which avoids most of these problems.

Reply2
Load More
10Wei Dai's Shortform
Ω
2y
Ω
234
10Wei Dai's Shortform
Ω
2y
Ω
234
65Managing risks while trying to do good
2y
26
46AI doing philosophy = AI generating hands?
Ω
2y
Ω
23
224UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
55How to bet against civilizational adequacy?
Q
3y
Q
20
5AI ethics vs AI alignment
3y
1
118A broad basin of attraction around human values?
Ω
4y
Ω
18
234Morality is Scary
Ω
4y
Ω
116
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More