LESSWRONG
LW

2820
Wei Dai
42676Ω3016146514418
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
291
Legible vs. Illegible AI Safety Problems
Wei Dai17h20

"Musings on X" style posts tend not to be remembered as much, and I think this is a fairly important post for people to remember.

I guess I'm pretty guilty of this, as I tend to write "here's a new concept or line of thought, and its various implications" style posts, and sometimes I just don't want to spoil the ending/conclusion, like maybe I'm afraid people won't read the post if they can just glance at the title and decide whether they already agree or disagree with it, or think they know what I'm going to say? The Nature of Offense is a good example of the latter, where I could have easily titled it "Offense is about Status".

Not sure if I want to change my habit yet. Any further thoughts on this, or references about this effect, how strong it is, etc.?

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai17h42

That's a good point. I hope Joe ends up focusing more on this type of work during his time at Anthropic.

Reply
Heroic Responsibility
Wei Dai18h40

What are the disagreement votes for[1], given that my comment is made of questions and a statement of confusion? What are the voters disagreeing about?

(I've seen this in the past as well, disagreement votes on my questioning comments, so figure I'd finally ask what people have in mind when're voting like this.)

  1. ^

    2 votes totally -3 agreement, at the time of this writing

Reply
Wei Dai's Shortform
Wei Dai18h60
  1. I've seen somewhere that (some) people at AI labs are thinking in terms of shares of the future lightcone, not just money.
  2. If most of your friends are capabilities researchers who aren't convinced that they're work is negative EV yet, it might be pretty awkward when they ask why you've switched to safety.
  3. There's a big prestige drop (in many people's minds, such as one's parents') from being at a place like OpenAI (perceived by many as a group made up of the best of the best) to being an independent researcher. ("What kind of a job is that?!")
  4. Having to let go of sunken costs (knowledge/skills for capabilities research) and invest in a bunch of new human capital needed for safety research.
Reply1
Wei Dai's Shortform
Wei Dai18h20

Sorry, you might be taking my dialog too seriously, unless you've made such observations yourself, which of course is quite possible since you used to work at OpenAI. I'm personally far from the places where such dialogs might be occurring, so don't have any observations of them myself. It was completely imagined in my head, as a dark comedy about how counter to human (or most human's) nature strategic thinking/action about AI safety is, and partly a bid for sympathy for the people caught in the whiplashes, to whom this kind of thinking or intuition doesn't come naturally.

Edit: To clarify a bit more, B's reactions like "WTF!" were written more for comedic effect, rather than trying to be realistic or based on my best understanding/predictions of how a typical AI researcher would actually react. It might still be capturing some truth, but again just want to make sure people aren't taking my dialog more seriously than I intend.

Reply
Wei Dai's Shortform
Wei Dai1d270

The Inhumanity of AI Safety

A: Hey, I just learned about this idea of artificial superintelligence. With it, we can achieve incredible material abundance with no further human effort!

B: Thanks for telling me! After a long slog and incredible effort, I'm now a published AI researcher!

A: No wait! Don't work on AI capabilities, that's actually negative EV!

B: What?! Ok, fine, at huge personal cost, I've switched to AI safety.

A: No! The problem you chose is too legible!

B: WTF! Alright you win, I'll give up my sunken costs yet again, and pick something illegible. Happy now?

A: No wait, stop! Someone just succeeded in making that problem legible!

B: !!!

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai1d60

Legible problem is pretty easy to give examples for. The most legible problem (in terms of actually gating deployment) is probably wokeness for xAI, and things like not expressing an explicit desire to cause human extinction, not helping with terrorism (like building bioweapons) on demand, etc., for most AI companies.

Giving an example for an illegible problem is much trickier since by their nature they tend to be obscure, hard to understand, or fall into a cognitive blind spot. If I give an example of a problem that seems real to me, but illegible to most, then most people will fail to understand it or dismiss it as not a real problem, instead of recognizing it as an example of a real but illegible problem. This could potentially be quite distracting, so for this post I decided to just talk about illegible problems in a general, abstract way, and discuss general implications that don't depend on the details of the problems.

But if you still want some explicit examples, see this thread.

Reply2
Legible vs. Illegible AI Safety Problems
Wei Dai1d50

https://www.lesswrong.com/posts/M9iHzo2oFRKvdtRrM/reminder-morality-is-unsolved?commentId=bSoqdYNRGhqDLxpvM

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai1d50

maybe we could get some actual concrete examples of illegible problems and reasons to think they are important?

See Problems in AI Alignment that philosophers could potentially contribute to and this comment from a philosopher saying that he thinks they're important, but "seems like there's not much of an appetite among AI researchers for this kind of work" suggesting illegibility.

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai1dΩ330

Yeah it's hard to think of a clear improvement to the title. I think I'm mostly trying to point out that thinking about legible vs illegible safety problems leads to a number of interesting implications that people may not have realized. At this point the karma is probably high enough to help attract readers despite the boring title, so I'll probably just leave it as is.

Reply
Load More
196Legible vs. Illegible AI Safety Problems
Ω
2d
Ω
43
63Trying to understand my own cognitive edge
3d
13
10Wei Dai's Shortform
Ω
2y
Ω
291
65Managing risks while trying to do good
2y
28
47AI doing philosophy = AI generating hands?
Ω
2y
Ω
23
228UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
55How to bet against civilizational adequacy?
Q
3y
Q
20
7AI ethics vs AI alignment
3y
1
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More