LESSWRONG
LW

345
Wei Dai
42527Ω2983146513818
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
283
Legible vs. Illegible AI Safety Problems
Wei Dai2h60

Legible problem is pretty easy to give examples for. The most legible problem (in terms of actually gating deployment) is probably wokeness for xAI, and things like not expressing an explicit desire to cause human extinction, not helping with terrorism (like building bioweapons) on demand, etc., for most AI companies.

Giving an example for an illegible problem is much trickier since by their nature they tend to be obscure, hard to understand, or fall into a cognitive blind spot. If I give an example of a problem that seems real to me, but illegible to most, then most people will fail to understand it or dismiss it as not a real problem, instead of recognizing it as an example of a real but illegible problem. This could potentially be quite distracting, so for this post I decided to just talk about illegible problems in a general, abstract way, and discuss general implications that don't depend on the details of the problems.

But if you still want some explicit examples, see this thread.

Reply1
Legible vs. Illegible AI Safety Problems
Wei Dai5h30

https://www.lesswrong.com/posts/M9iHzo2oFRKvdtRrM/reminder-morality-is-unsolved?commentId=bSoqdYNRGhqDLxpvM

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai5h30

maybe we could get some actual concrete examples of illegible problems and reasons to think they are important?

See Problems in AI Alignment that philosophers could potentially contribute to and this comment from a philosopher saying that he thinks they're important, but "seems like there's not much of an appetite among AI researchers for this kind of work" suggesting illegibility.

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai8hΩ330

Yeah it's hard to think of a clear improvement to the title. I think I'm mostly trying to point out that thinking about legible vs illegible safety problems leads to a number of interesting implications that people may not have realized. At this point the karma is probably high enough to help attract readers despite the boring title, so I'll probably just leave it as is.

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai8h31

I think it is generous to say that legible problems remaining open will necessarily gate model deployment, even in those organizations conscientious enough to spend weeks doing rigorous internal testing.

In this case you can apply a modified form of my argument, by replacing "legible safety problems" with "safety problems that are actually likely to gate deployment", and then the conclusion would be that working on such safety problems are of low or negative EV for the x-risk concerned.

Reply
Heroic Responsibility
Wei Dai8h90

It means that if a problem isn't actually going to get solved by someone else, then it's my job to make sure it gets solved, no matter who's job it is on paper.

There is a countless number of problems in the world that are not actually going to get solved, by anyone. This seems to imply that it's my job to make sure they all get solved. This seems absurd and can't be what it means, but what is the actual meaning of heroic responsibility then?

For example, does it mean that I should pick the problem to work on that has the highest EV per unit of my time, or pick the problem that I have the biggest comparative advantage in, or something like that? But then how does "heroic responsibility" differ from standard EA advice and what is "heroic" about it? (Or maybe it was more heroic and novel, at a time when there was no standard EA advice?) Anyway I'm pretty confused.

Reply1
Legible vs. Illegible AI Safety Problems
Wei Dai9hΩ330

What about more indirect or abstract capabilities work, like coming up with some theoretical advance that would be very useful for capabilities work, but not directly building a more capable AI (thus not "directly involves building a dangerous thing")?

And even directly building a more capable AI still requires other people to respond with bad thing Y = "deploy it before safety problems are sufficiently solved" or "fail to secure it properly", doesn't it? It seems like "good things are good" is exactly the kind of argument that capabilities researchers/proponents give, i.e., that we all (eventually) want a safe and highly capable AGI/ASI, so the "good things are good" heuristic says we should work on capabilities as part of achieving that, without worrying about secondary or strategic considerations, or just trusting everyone else to do their part like ensuring safety.

Reply
Wei Dai's Shortform
Wei Dai12h20

One potential issue is that this makes posting shortforms even more attractive, so you might see everything being initially posted as shortforms (except maybe very long effortposts) since there's no downside to doing that. I wonder if that's something the admins want to see.

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai13hΩ220

Any suggestions?

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai13hΩ230

Thanks! Assuming it is actually important, correct, and previously unexplicated, it's crazy that I can still find a useful concept/argument this simple and obvious (in retrospect) to write about, at this late date.

Reply
Load More
116Legible vs. Illegible AI Safety Problems
Ω
16h
Ω
28
62Trying to understand my own cognitive edge
2d
13
10Wei Dai's Shortform
Ω
2y
Ω
283
65Managing risks while trying to do good
2y
28
47AI doing philosophy = AI generating hands?
Ω
2y
Ω
23
226UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
55How to bet against civilizational adequacy?
Q
3y
Q
20
7AI ethics vs AI alignment
3y
1
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More