LESSWRONG
LW

587
Wei Dai
42958Ω3075147516018
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
295
Problems I've Tried to Legibilize
Wei Dai9hΩ220

normally when I think about this problem I resolve it as "what narrow capabilities can we build that are helpful 'to the workflow' of people solving illegible problems, that aren't particularly bad from a capabilities standpoint".

Do you have any writings about this, e.g., examples of what this line of thought led to?

Reply
Problems I've Tried to Legibilize
Wei Dai10hΩ230

I agree this is a major risk. (Another one is that it's just infeasible to significantly increase AI philosophical competence in the relevant time frame. Another one is that it's much easier to make it appear like the AI is more philosophically competent, giving us false security.) So I continue to think that pausing/stopping AI should be plan A (which legibilizing the problem of AI philosophical competence can contribute to), with actually improving AI philosophical competence as (part of) plan B. Having said that, 2 reasons this risk might not bear out:

  1. Empirically the best capabilities people (e.g., STEM/finance workers, managers, politicians) tent to be distinct from the best philosophers. And there are whole cultures (e.g. China) getting very good at STEM but still far behind at making philosophical progress.
    1. But the opportunity cost of learning additional skills for AIs appears much lower than for humans, so this pattern might not carry forward to future AIs.
  2. If I'm right about "philosophy reasoning" being some kind of (currently opaque) general but slow problem solving method, and we already have more legible, specialized, and faster methods for specific areas, such as math, science, and engineering, with "philosophical problems" being left-over problems that lack such faster methods, then making AIs better at philosophical reasoning ought to help with philosophical problems more than other types of problems.
    1. But philosophical reasoning can still help with "non-philosophical" problems, if those problems have some parts that are "more philosophical" that can be sped up by applying good philosophical reasoning. 

To conclude I'm quite worried about the risks/downsides of trying to increase AI philosophical competence, but it seems to a problem that has to be solved eventually. "The only way out is through" but we can certainly choose to do it at a more opportune time, when humans are much smarter on average and have made a lot more progress in metaphilosophy (understanding the nature of philosophy and philosophical reasoning).

Reply
Problems I've Tried to Legibilize
Wei Dai11hΩ442

even on alignment

I see a disagreement vote on this, but I think it does make sense. Alignment work at the AI labs will almost by definition be work on legible problems, but we should make exceptions for people who can give reasons for why their work is not legible (or otherwise still positive EV), or who are trying to make illegible problems more legible for others at the labs.

Think more seriously about building organizations that will make AI power more spread out.

I start to disagree from here, as this approach would make almost all of the items on my list worse, and I'm not sure which ones it would make better. You started this thread by say "Even if we solved metaethics and metaphilosophy tomorrow, and gave them the solution on a plate, they wouldn't take it." which I'm definitely very worried about, but how does making AI power more spread out help with this? Is the average human (or humanity collectively) more likely to be concerned about metaethics and metaphilosophy than a typical AI lab leader, or easier to make concerned? I think the opposite is more likely to be true?

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai1d51

EA Forum allows agree/disagree voting on posts (why doesn't LW have this, BTW?) and the post there currently has 6 agrees and 0 disagrees. There may actually be a surprisingly low amount of disagreement, as opposed to people not bothering to write up their pushback.

Reply
Problems I've Tried to Legibilize
Wei Dai1dΩ230

I'm uncertain between conflict theory and mistake theory, and think it partly depends on metaethics, and therefore it's impossible to be sure which is correct in the foreseeable future - e.g., if everyone ultimately should converge to the same values, then all of our current conflicts are really mistakes. Note that I do often acknowledge conflict theory, like in this list I have "Value differences/conflicts between humans". It's also quite possible that it's really a mix of both, that some of the conflicts are mistakes and others aren't.

In practice I tend to focus more on mistake-theoretic ideas/actions. Some thoughts on this:

  1. If conflict theory is true, then I'm kind of screwed anyway, having invested little human and social capital into conflict-theoretic advantages, as well as not having much talent or inclination in that kind of work in the first place.
  2. I do try not to interfere people doing conflict-theoretic work (on my side), e.g., not berate them for having "bad epistemics" or not adopting mistake theory lenses, etc.
  3. It may be nearly impossible to convince some decision makers that they're making mistakes, but perhaps others are more open to persuasion, e.g. people in charge of or doing ground-level work on AI advisors or AI reasoning.
  4. Maybe I can make a stronger claim that a lot of people are making mistakes, given current ethical and metaethical uncertainty. In other words, people should be unsure about their values, including how selfish or altruistic they should be, and under this uncertainty they shouldn't be doing something like trying to max out their own power/resources at the expense of the commons or by incurring societal-level risks. If so, then perhaps an AI advisor who is highly philosophically competent can realize this too and convince its principle of the same, before it's too late.

(I think this is probably the first time I've explicitly written down the reasoning in 4.)

I think we need a different plan.

Do you have any ideas in mind that you want to talk about?

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai1d80

I added a bit to the post to address this:

Edit: Many people have asked for examples of illegible problems. I wrote a new post listing all of the AI safety problems that I've tried to make more legible over the years, in part to answer this request. Some have indeed become more legible over time (perhaps partly due to my efforts), while others remain largely illegible to many important groups.

@Ebenezer Dukakis @No77e @sanyer 

Reply1
Legible vs. Illegible AI Safety Problems
Wei Dai3d40

Thanks, I've seen/skimmed your sequence. I think I agree directionally but not fully with your conclusions, but am unsure. My current thinking is that humanity clearly shouldn't be attempting an AI transition now, and stopping AI development has the least problems with unawareness (it involves the least radical changes and therefore is easiest to predict / steer, is least likely to have some unforeseen strategic complications), and then once that's achieved, we should carefully and patiently try to figure out all the crucial considerations until it looks like we've finally found all of the most important ones, and only then attempt an AI transition.

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai3d73

Yes, some people are already implicitly doing this, but if we don't make it explicit:

  1. We can't explain to the people not doing it (i.e., those working on already legible problems) why they should switch directions.
  2. Even MIRI is doing it suboptimally because they're not reasoning about it explicitly. I think they're focusing too much on one particular x-safety problem (AI takeover caused by misalignment) that's highly legible to themselves and not to the public/policymakers, and that's problematic because what happens if someone comes up with an alignment breakthrough? Their arguments become invalidated and there's no reason to stop holding back AGI/ASI anymore (in the public/policymakers' eyes), but still plenty of illegible x-safety problems left.
Reply
Legible vs. Illegible AI Safety Problems
Wei Dai3d20

https://www.lesswrong.com/posts/PMc65HgRFvBimEpmJ/legible-vs-illegible-ai-safety-problems?commentId=sJ3AS3LLgNjsiNN3c

Reply
Wei Dai's Shortform
Wei Dai3d86

This has pretty low argumentative/persuasive force in my mind.

then I expect that they will tend towards doing "illegible" research even if they're not explicitly aware of the legible/illegible distinction.

Why? I'm not seeing the logic of how your premises lead to this conclusion.

And even if there is this tendency, what if someone isn't smart enough to come up with a new line of illegible research, but does see some legible problem with an existing approach that they can contribute to? What would cause them to avoid this?

And even the hypothetical virtuous person who starts doing illegible research on their own, what happens when other people catch up to him and the problem becomes legible to leaders/policymakers? How would they know to stop working on that problem and switch to another problem that is still illegible?

Reply
Load More
93Problems I've Tried to Legibilize
Ω
2d
Ω
14
281Legible vs. Illegible AI Safety Problems
Ω
2d
Ω
85
65Trying to understand my own cognitive edge
8d
17
10Wei Dai's Shortform
Ω
2y
Ω
295
66Managing risks while trying to do good
2y
28
48AI doing philosophy = AI generating hands?
Ω
2y
Ω
24
228UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
55How to bet against civilizational adequacy?
Q
3y
Q
20
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More