LESSWRONG
LW

157
Wei Dai
42988Ω3085147516018
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
296
Wei Dai's Shortform
Wei Dai9hΩ340

An update on this 2010 position of mine, which seems to have become conventional wisdom on LW:

In my posts, I've argued that indexical uncertainty like this shouldn't be represented using probabilities. Instead, I suggest that you consider yourself to be all of the many copies of you, i.e., both the ones in the ancestor simulations and the one in 2010, making decisions for all of them. Depending on your preferences, you might consider the consequences of the decisions of the copy in 2010 to be the most important and far-reaching, and therefore act mostly as if that was the only copy. [Emphasis added]

In the subsequent 15 years, I've upweighted influencing the multiverse through my copies in simulations, relative to base universes, to where they're about equally important in my mind. For 4 reasons:

  1. Realizing that there are probably universes with vastly greater computational resources than ours, implying there are more simulations containing me than I had thought.
  2. Realizing that there may well be many possible motivations for running simulations, besides interest in ancestors, which was the main motivation talked about back in 2010. This includes acausal trade, curiosity about aliens, science on the distribution of superintelligences in the multiverse, gathering philosophical ideas, entertainment, altruism.
  3. Being fortunate in various ways, on a personal level, that seem more likely if I'm in a simulation.
  4. Doing badly on a civilization level, in a way that devalues trying to influence base reality. I.e, relative to 2010, it looks more likely we'll undergo the AI transition blindly, without key decision makers paying attention to important philosophical and strategic ideas.

Concretely, this mostly cashes out to me thinking and writing with potential simulators in mind as an additional audience, hoping my ideas might benefit or interest some of them even if they end up largely ignored in this reality.

Reply
Problems I've Tried to Legibilize
Wei Dai21hΩ330

normally when I think about this problem I resolve it as "what narrow capabilities can we build that are helpful 'to the workflow' of people solving illegible problems, that aren't particularly bad from a capabilities standpoint".

Do you have any writings about this, e.g., examples of what this line of thought led to?

Reply
Problems I've Tried to Legibilize
Wei Dai21hΩ230

I agree this is a major risk. (Another one is that it's just infeasible to significantly increase AI philosophical competence in the relevant time frame. Another one is that it's much easier to make it appear like the AI is more philosophically competent, giving us false security.) So I continue to think that pausing/stopping AI should be plan A (which legibilizing the problem of AI philosophical competence can contribute to), with actually improving AI philosophical competence as (part of) plan B. Having said that, 2 reasons this risk might not bear out:

  1. Empirically the best capabilities people (e.g., STEM/finance workers, managers, politicians) tent to be distinct from the best philosophers. And there are whole cultures (e.g. China) getting very good at STEM but still far behind at making philosophical progress.
    1. But the opportunity cost of learning additional skills for AIs appears much lower than for humans, so this pattern might not carry forward to future AIs.
  2. If I'm right about "philosophy reasoning" being some kind of (currently opaque) general but slow problem solving method, and we already have more legible, specialized, and faster methods for specific areas, such as math, science, and engineering, with "philosophical problems" being left-over problems that lack such faster methods, then making AIs better at philosophical reasoning ought to help with philosophical problems more than other types of problems.
    1. But philosophical reasoning can still help with "non-philosophical" problems, if those problems have some parts that are "more philosophical" that can be sped up by applying good philosophical reasoning. 

To conclude I'm quite worried about the risks/downsides of trying to increase AI philosophical competence, but it seems to a problem that has to be solved eventually. "The only way out is through" but we can certainly choose to do it at a more opportune time, when humans are much smarter on average and have made a lot more progress in metaphilosophy (understanding the nature of philosophy and philosophical reasoning).

Reply
Problems I've Tried to Legibilize
Wei Dai1dΩ442

even on alignment

I see a disagreement vote on this, but I think it does make sense. Alignment work at the AI labs will almost by definition be work on legible problems, but we should make exceptions for people who can give reasons for why their work is not legible (or otherwise still positive EV), or who are trying to make illegible problems more legible for others at the labs.

Think more seriously about building organizations that will make AI power more spread out.

I start to disagree from here, as this approach would make almost all of the items on my list worse, and I'm not sure which ones it would make better. You started this thread by say "Even if we solved metaethics and metaphilosophy tomorrow, and gave them the solution on a plate, they wouldn't take it." which I'm definitely very worried about, but how does making AI power more spread out help with this? Is the average human (or humanity collectively) more likely to be concerned about metaethics and metaphilosophy than a typical AI lab leader, or easier to make concerned? I think the opposite is more likely to be true?

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai2d51

EA Forum allows agree/disagree voting on posts (why doesn't LW have this, BTW?) and the post there currently has 6 agrees and 0 disagrees. There may actually be a surprisingly low amount of disagreement, as opposed to people not bothering to write up their pushback.

Reply
Problems I've Tried to Legibilize
Wei Dai2dΩ230

I'm uncertain between conflict theory and mistake theory, and think it partly depends on metaethics, and therefore it's impossible to be sure which is correct in the foreseeable future - e.g., if everyone ultimately should converge to the same values, then all of our current conflicts are really mistakes. Note that I do often acknowledge conflict theory, like in this list I have "Value differences/conflicts between humans". It's also quite possible that it's really a mix of both, that some of the conflicts are mistakes and others aren't.

In practice I tend to focus more on mistake-theoretic ideas/actions. Some thoughts on this:

  1. If conflict theory is true, then I'm kind of screwed anyway, having invested little human and social capital into conflict-theoretic advantages, as well as not having much talent or inclination in that kind of work in the first place.
  2. I do try not to interfere people doing conflict-theoretic work (on my side), e.g., not berate them for having "bad epistemics" or not adopting mistake theory lenses, etc.
  3. It may be nearly impossible to convince some decision makers that they're making mistakes, but perhaps others are more open to persuasion, e.g. people in charge of or doing ground-level work on AI advisors or AI reasoning.
  4. Maybe I can make a stronger claim that a lot of people are making mistakes, given current ethical and metaethical uncertainty. In other words, people should be unsure about their values, including how selfish or altruistic they should be, and under this uncertainty they shouldn't be doing something like trying to max out their own power/resources at the expense of the commons or by incurring societal-level risks. If so, then perhaps an AI advisor who is highly philosophically competent can realize this too and convince its principle of the same, before it's too late.

(I think this is probably the first time I've explicitly written down the reasoning in 4.)

I think we need a different plan.

Do you have any ideas in mind that you want to talk about?

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai2d80

I added a bit to the post to address this:

Edit: Many people have asked for examples of illegible problems. I wrote a new post listing all of the AI safety problems that I've tried to make more legible over the years, in part to answer this request. Some have indeed become more legible over time (perhaps partly due to my efforts), while others remain largely illegible to many important groups.

@Ebenezer Dukakis @No77e @sanyer 

Reply1
Legible vs. Illegible AI Safety Problems
Wei Dai4d40

Thanks, I've seen/skimmed your sequence. I think I agree directionally but not fully with your conclusions, but am unsure. My current thinking is that humanity clearly shouldn't be attempting an AI transition now, and stopping AI development has the least problems with unawareness (it involves the least radical changes and therefore is easiest to predict / steer, is least likely to have some unforeseen strategic complications), and then once that's achieved, we should carefully and patiently try to figure out all the crucial considerations until it looks like we've finally found all of the most important ones, and only then attempt an AI transition.

Reply
Legible vs. Illegible AI Safety Problems
Wei Dai4d73

Yes, some people are already implicitly doing this, but if we don't make it explicit:

  1. We can't explain to the people not doing it (i.e., those working on already legible problems) why they should switch directions.
  2. Even MIRI is doing it suboptimally because they're not reasoning about it explicitly. I think they're focusing too much on one particular x-safety problem (AI takeover caused by misalignment) that's highly legible to themselves and not to the public/policymakers, and that's problematic because what happens if someone comes up with an alignment breakthrough? Their arguments become invalidated and there's no reason to stop holding back AGI/ASI anymore (in the public/policymakers' eyes), but still plenty of illegible x-safety problems left.
Reply
Legible vs. Illegible AI Safety Problems
Wei Dai4d20

https://www.lesswrong.com/posts/PMc65HgRFvBimEpmJ/legible-vs-illegible-ai-safety-problems?commentId=sJ3AS3LLgNjsiNN3c

Reply
Load More
105Problems I've Tried to Legibilize
Ω
2d
Ω
16
293Legible vs. Illegible AI Safety Problems
Ω
2d
Ω
87
65Trying to understand my own cognitive edge
9d
17
10Wei Dai's Shortform
Ω
2y
Ω
296
66Managing risks while trying to do good
2y
28
49AI doing philosophy = AI generating hands?
Ω
2y
Ω
24
228UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
55How to bet against civilizational adequacy?
Q
3y
Q
20
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More