LESSWRONG
LW

844
Wei Dai
42268Ω2930145511918
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
275
Trying to understand my own cognitive edge
Wei Dai2h90

b-money: I guess most people working on crypto-based payments were trying to integrate with the traditional banking system, and didn't have the insight/intuition that money is just a way for everyone to "keep tabs" of how much society as a whole owes to each person (e.g. for previous services rendered), and therefore a new form of money (i.e. not fiat or commodity) could be created and implemented as a public/distributed database or ledger.

UDT: I initially became interested in decision theory for a very different reason than Eliezer. I was trying to solve anthropic reasoning, and tried a lot of different ideas but couldn't find one that was satisfactory. Eventually I decided to look into decision theory (as the "source" of probability theory) and had the insight/intuition that if the decision theory didn't do any updating then we could sidestep the entire problem of anthropic reasoning. Hal Finney was the only one to seriously try to understand this idea, but couldn't or didn't appreciate it (in fairness my proto-UDT was way more complicated than EDT, CDT, or the later UDT, because I noticed that it would cooperate with its twin in one-shot PD, and added complications to make it defect instead, not questioning the conventional wisdom that that's what's rational).

Eventually I got the idea/hint from Eliezer that it can be rational to cooperate in one-shot PD, and also realized my old idea seem to fit well with what Nesov was discussing (counterfactual mugging), and this caused me to search for a formulation that was simple/elegant and could solve all of the problems known at the time, which became known as UDT.

I think Eliezer was also interested in anthropic reasoning, so I think he was missing my move to look into decision theory for inspiration/understanding and then making the radical call that maybe anthropic reasoning is unsolvable as posed, and should be side-stepped via a change to decision theory.

need for an AI pause/slowdown: I think I found Eliezer convincing when he started talking about the difficulty of making AI Friendly and why others likely wouldn't try hard enough to succeed, and just found it implausible that he could with a small team win a race against the entire world who was spending much less effort/resources on trying to make their AIs Friendly. Plus I had my own worries early on that we needed to either solve all the important philosophical problems before building AGI/ASI, or figure out how to make sure the AI itself is philosophically competent, and both are unlikely to happen without a pause/slowdown (partly because nobody else seemed to share this concern or talked about it).

Reply
Human Values ≠ Goodness
Wei Dai12h90
  1. How does this carry into the future, when we'll be able to modify our brains/minds?
    1. Are our Values the real-world things that trigger our feelings, or the feelings themselves? (If the latter, we'll be able to artificially trigger them at negligible cost and with no negative side effects, unlike today.)
    2. "We Don’t Get To Choose Our Own Values" will be false, so that part will be irrelevant. How does this affect your arguments/conclusions?
  2. Even today, Goodness-as-memetic-egregore can (and have) heavily influence our Values, through the kind of mechanism described in Morality is Scary. (Think of the Communists who yearned for communism so much that they were willing to endure extreme hardship and even torture for it.) This seems like a crucial part of the picture that you didn't mention, and which complicates any effort to draw conclusions from it.
  3. My own perspective is that what you call Human Values and Goodness are both potential sources (along with others) of "My Real Values", which I'll only be able to really figure out after doing or learning a lot more philosophy (e.g., to figure out which ones I really want to, or should, keep or discard, or how to answer questions like the above). In the meantime, my main goals are to preserve/optimize my option values and ability to eventually do/learn such philosophy, and don't do anything that might turn out to be really bad according to "My Real Values" (like deny some strong short-term desire, or commit a potential moral atrocity), using something like Bostrom and Ord's Moral Parliament model for handling moral uncertainty.
Reply
Mo Putera's Shortform
Wei Dai3d130

I wonder how Eliezer would describe his "moat", i.e., what cognitive trait or combination of traits does he have, that is rarest or hardest to cultivate in others? (Would also be interested in anyone else's take on this.)

Reply
Shortform
Wei Dai4d20

I'm curious what you say about "which are the specific problems (if any) where you specifically think 'we really need to have solved philosophy / improved-a-lot-at-metaphilosophy' to have a decent shot at solving this?'"

Assuming by "solving this" you mean solving AI x-safety or navigating the AI transition well, I just post a draft about this. Or if you already read that and are asking for an even more concrete example, a scenario I often think about is an otherwise aligned ASI, some time into the AI transition when things are moving very fast (from a human perspective) and many highly consequential decisions need to be made (e.g., what alliances to join, how to bargain with others, how to self-modify or take advantage of the latest AI advances, how to think about AI welfare and other near-term ethical issues, what to do about commitment races and threats, how to protect the user against manipulation or value drift, whether to satisfy some user request that might be harmful according to their real values) that often involve philosophical problems. And they can't just ask their user (or alignment target) or even predict "what would the user say if they thought about this for a long time" because the user themselves may not be philosophically very competent and/or making such predictions with high accuracy (over a long enough time frame) is still outside their range of capabilities.

So the specific problem is how to make sure this AI doesn't make wrong decisions that cause a lot of waste or harm, that quickly or over time cause most of the potential value of the universe to be lost, which in turn seems to involve figuring out how the AI should be thinking about philosophical problems, or how to make the AI philosophically competent even if their alignment target isn't.

Does this help / is this the kind of answer you're asking for?

Reply1
Wei Dai's Shortform
Wei Dai4dΩ3010616

Some of Eliezer's founder effects on the AI alignment/x-safety field, that seem detrimental and persist to this day:

  1. Plan A is to race to build a Friendly AI before someone builds an unFriendly AI.
  2. Metaethics is a solved problem. Ethics/morality/values and decision theory are still open problems. We can punt on values for now but do need to solve decision theory. In other words, decision theory is the most important open philosophical problem in AI x-safety.
  3. Academic philosophers aren't very good at their jobs (as shown by their widespread disagreements, confusions, and bad ideas), but the problems aren't actually that hard, and we (alignment researchers) can be competent enough philosophers and solve all of the necessary philosophical problems in the course of trying to build Friendly (or aligned/safe) AI.

I've repeatedly argued against 1 from the beginning, and also somewhat against 2 and 3, but perhaps not hard enough because I personally benefitted from them, i.e., having pre-existing interest/ideas in decision theory that became validated as centrally important for AI x-safety, and generally finding a community that is interested in philosophy and took my own ideas seriously.

Eliezer himself is now trying hard to change 1, and I think we should also try harder to correct 2 and 3. On the latter, I think academic philosophy suffers from various issues, but also that the problems are genuinely hard, and alignment researchers seem to have inherited Eliezer's gung-ho attitude towards solving these problems, without adequate reflection. Humanity having few competent professional philosophers should be seen as (yet another) sign that our civilization isn't ready to undergo the AI transition, not a license to wing it based on one's own philosophical beliefs or knowledge!

In this recent EAF comment, I analogize AI companies trying to build aligned AGI with no professional philosophers on staff (the only exception I know is Amanda Askell) with a company trying to build a fusion reactor with no physicists on staff, only engineers. I wonder if that analogy resonates with anyone.

Reply51
Shortform
Wei Dai5d*100

To try to explain how I see the difference between philosophy and metaphilosophy:

My definition of philosophy is similar to @MichaelDickens' but I would use "have serviceable explicitly understood methods" instead of "formally studied" or "formalized" to define what isn't philosophy, as the latter might be or could be interpreted as being too high of a bar, e.g., in the sense of formal systems.

So in my view, philosophy is directly working on various confusing problems (such as "what is the right decision theory") using whatever poorly understood methods that we have or can implicitly apply, and then metaphilosophy is trying to help solve these problems on a meta level, by better understanding the nature of philosophy, for example:

  1. Try to find if there is some unifying quality that ties all of these "philosophical" problems together (besides "lack of serviceable explicitly understood methods").
  2. Try to formalize some part of philosophy, or find explicitly understood methods for solving certain philosophical problems.
  3. Try to formalize all of philosophy wholesale, or explicitly understand what is it that humans are doing (or should be doing, or what AIs should be doing) when it comes to solving problems in general. This may not be possible, i.e., maybe there is no such general method that lets us solve every problem given enough time and resources, but it sure seems like humans have some kind of general purpose (but poorly understood) method, that lets us make progress slowly over time on a wide variety of problems, including ones that are initially very confusing, or hard to understand/explain what we're even asking, etc. We can at least aim to understand what is it that humans are or have been doing, even if it's not a fully general method.
     

Does this make sense?

Reply
Shortform
Wei Dai5d*153

One way to see that philosophy is exceptional is that we have serviceable explicit understandings of math and natural science, even formalizations in the forms of axiomatic set theory and Solomonoff Induction, but nothing comparable in the case of philosophy. (Those formalizations are far from ideal or complete, but still represent a much higher level of understanding than for philosophy.)

If you say that philosophy is a (non-natural) science, then I challenge you, come up with something like Solomonoff Induction, but for philosophy.

Reply2
life lessons from trading
Wei Dai5d177
  1. Trading is a zero sum game inside a larger positive sum game. Though every trade has a winner and offsetting losers,

This isn't true. Sometimes you're trading against someone with non-valuation motives, i.e., someone buying or selling for a reason besides thinking that the current market price is too low or too high, for example, someone being liquidated due to a margin violation, or the founder of a company wanting to sell in order to diversify. In that case, it makes more sense to think of yourself as providing a service for the other side of the trade, instead of there being a winner and a loser.

markets as a whole direct resources across space and time and help civilizations grow.

Unpriced externalities imply that sometimes markets harm civilizations. I think investments into AGI/ASI is a prime example of this, with x-risks being the unpriced externality.

Reply
leogao's Shortform
Wei Dai7d00

Figuring out the underlying substance behind "philosophy" is a central project of metaphilosophy, which is far from solved, but my usual starting point is "trying to solve confusing problems which we don't have established methodologies for solving" (methodologies meaning explicitly understood methods), which I think bakes in the least amount of assumptions about what philosophy is or could be, while still capturing the usual meaning of "philosophy" and explains why certain fields started off as being part of philosophy (e.g., science starting off as nature philosophy) and then became "not philosophy" when we figured out methodologies for solving them.

I think "figure out what are the right concepts to be use, and, use those concepts correctly, across all of relevant-Applied-conceptspace" is the expanded version of what I meant, which maybe feels more likely to be what you mean.

This bakes in "concepts" being the most important thing, but is that right? Must AIs necessarily think about philosophy using "concepts", or is that really the best way to formulate how idealized philosophical reasoning should work?

Is "concepts" even what distinguishes philosophy from non-philosophical problems, or is "concepts" just part of how humans reason about everything, which we latch onto when trying to define or taboo philosophy, because we have nothing else better to latch onto? My current perspective is that what uniquely distinguishes philosophy is their confusing nature and the fact that we have no well-understood methods for solving them (but would of course be happy to hear any other perspectives on this).

Regarding good philosophical taste (or judgment), that is another central mystery of metaphilosophy, which I've been thinking a lot about but don't have any good handles on. It seems like a thing that exists (and is crucial) but is very hard to see how/why it could exist or what kind of thing it could be.

So anyway, I'm not sure how much help any of this is, when trying to talk to the type of person you mentioned. The above are mostly some cached thoughts I have on this, originally for other purposes.

BTW, good philosophical taste being rare definitely seems like a very important part of the strategic picture, which potentially makes the overall problem insurmountable. My main hopes are 1) someone makes an unexpected metaphilosophical breakthrough (kind of like Satoshi coming out of nowhere to totally solve distributed currency) and there's enough good philosophical taste among the AI safety community (including at the major labs) to recognize it and incorporate it into AI design or 2) there's an AI pause during which human intelligence enhancement comes online and selecting for IQ increases the prevalence of good philosophical taste as a side effect (as it seems too much to hope that good philosophical taste would be directly selected for) and/or there's substantial metaphilosophical progress during the pause.

Reply
leogao's Shortform
Wei Dai7d40

Unless you can abstract out the "alignment reasoning and judgement" part of a human's entire brain process (and philosophical reasoning and judgement as part of that) into some kind of explicit understanding of how it works, how do you actually build that into AI without solving uploading (which we're obviously not on track to solve in 2-4 year either)?

put a bunch of smart thoughtful humans in a sim and run it for a long time

Alignment researchers have had this thought for a long time (see e.g. Paul Christiano's A formalization of indirect normativity) but I think all of the practical alignment research programs that this line of thought led to, such as IDA and Debate, are all still bottlenecked by lack of metaphilosophical understanding, because without the kind of understanding that lets you build an "alignment/philosophical reasoning checker" (analogous to a proof checker for mathematical reasoning) they're stuck trying to do ML of alignment/philosophical reasoning from human data, which I think is unlikely to work out well.

Reply
Load More
28Trying to understand my own cognitive edge
5h
3
10Wei Dai's Shortform
Ω
2y
Ω
275
65Managing risks while trying to do good
2y
28
47AI doing philosophy = AI generating hands?
Ω
2y
Ω
23
226UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
55How to bet against civilizational adequacy?
Q
3y
Q
20
7AI ethics vs AI alignment
3y
1
120A broad basin of attraction around human values?
Ω
4y
Ω
18
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More