Wei Dai

I think I need more practice talking with people in real time (about intellectual topics). (I've gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

www.weidai.com

Comments

Sorted by
Wei Dai3425

And I agree with Bryan Caplan's recent take that friendships are often a bigger conflict of interest than money, so Open Phil higher-ups being friends with Anthropic higher-ups is troubling.

No kidding. From https://www.openphilanthropy.org/grants/openai-general-support/:

OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.

Wish OpenPhil and EAs in general were more willing to reflect/talk publicly about their mistakes. Kind of understandable given human nature, but still... (I wonder if there are any mistakes I've made that I should reflect more on.)

To be clear, by "indexical values" in that context I assume you mean indexing on whether a given world is "real" vs "counterfactual," not just indexical in the sense of being egoistic? (Because I think there are compelling reasons to reject UDT without being egoistic.)

I think being indexical in this sense (while being altruistic) can also lead you to reject UDT, but it doesn't seem "compelling" that one should be altruistic this way. Want to expand on that?

Maybe breaking up certain biofilms held together by Ca?

Yeah there's a toothpaste on the market called Livfree that claims to work like this.

IIRC, high EDTA concentration was found to cause significant amounts of erosion.

Ok, that sounds bad. Thanks.

ETA: Found an article that explains how Livfree works in more detail:

Tooth surfaces are negatively charged, and so are bacteria; therefore, they should repel each other. However, salivary calcium coats the negative charges on the tooth surface and bacteria, allowing them to get very close (within 10 nm). At this point, van der Waal’s forces (attractive electrostatic forces at small distances) take over, allowing the bacteria to deposit on the tooth surfaces, initiating biofilm formation.10 A unique formulation of EDTA strengthens the negative electronic forces of the tooth, allowing the teeth to repel harmful plaque. This special formulation quickly penetrates through the plaque down to the tooth surface. There, it changes the surface charge back to negative by neutralizing the positively charged calcium ions. This new, stronger negative charge on the tooth surface environment simply allows the plaque and the tooth surface to repel each other. This requires neither an abrasive nor killing the bacteria (Figure 3).

The authors are very positive on this toothpaste, although they don't directly explain why it doesn't cause tooth erosion.

I actually no longer fully endorse UDT. It still seems a better decision theory approach than any other specific approach that I know, but it has a bunch of open problems and I'm not very confident that someone won't eventually find a better approach that replaces it.

To your question, I think if my future self decides to follow (something like) UDT, it won't be because I made a "commitment" to do it, but because my future self wants to follow it, because he thinks it's the right thing to do, according to his best understanding of philosophy and normativity. I'm unsure about this, and the specific objection you have is probably covered under #1 in my list of open questions in the link above.

(And then there's a very different scenario in which UDT gets used in the future, which is that it gets built into AIs, and then they keep using UDT until they decide not to, which if UDT is reflectively consistent would be never. I dis-endorse this even more strongly.)

Any thoughts on edathamil/EDTA or nano-hydroxyapatite toothpastes?

This means that in the future, there will likely be a spectrum of AIs of varying levels of intelligence, some much smarter than humans, others only slightly smarter, and still others merely human-level.

Are you imagining that the alignment problem is still unsolved in the future, such that all of these AIs are independent agents unaligned with each other (like humans currently are)? I guess in my imagined world, ASIs will have solved the alignment (or maybe control) problem at least for less intelligent agents, so you'd get large groups of AIs aligned with each other that can for many purposes be viewed as one large AI.

Building on (5), I generally expect AIs to calculate that it is not in their interest to expropriate wealth from other members of society, given how this could set a precedent for future wealth expropriation that comes back and hurts them selfishly.

At some point we'll reach technological maturity, and the ASIs will be able to foresee all remaining future shocks/changes to their economic/political systems, and probably determine that expropriating humans (and anyone else they decide to, I agree it may not be limited to humans) won't cause any future problems.

Even if a tiny fraction of consumer demand in the future is for stuff produced by humans, that could ensure high human wages simply because the economy will be so large.

This is only true if there's not a single human that decides to freely copy or otherwise reproduce themselves and drive down human wages to subsistence. And I guess yeah, maybe AIs will have fetishes like this, but (like my reaction to Paul Christiano's "1/trillion kindness" argument) I'm worried whether AIs might have less benign fetishes. This worry more than cancels out the prospect that humans might live / earn a wage from benign fetishes in my mind.

This might be the most important point on my list, despite saying it last, but I think humans will likely be able to eventually upgrade their intelligence, better allowing them to “keep up” with the state of the world in the future.

I agree this will happen eventually (if humans survive), but think it will take a long time because we'll have to solve a bunch of philosophical problems to determine how to do this safely (e.g. without losing or distorting our values) and we probably can't trust AI's help with these (although I'd love to change that, hence my focus on metaphilosophy), and in the meantime AIs will be zooming ahead partly because they started off thinking faster and partly because some will be reckless (like some humans currently are!) or have simple values that don't require philosophical contemplation to understand, so the situation I described is still likely to occur.

Wei Dai1511

It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

So assuming that AIs get rich peacefully within the system we have already established, we'll end up with a situation in which ASIs produce all value in the economy, and humans produce nothing but receive an income and consume a bunch, through ownership of capital and/or taxing the ASIs. This part should be non-controversial, right?

At this point, it becomes a coordination problem for the ASIs to switch to a system in which humans no longer exist or no longer receive any income, and the ASIs get to consume or reinvest everything they produce. You're essentially betting that ASIs can't find a way to solve this coordination problem. This seems like a bad bet to me. (Intuitively it just doesn't seem like a very hard problem, relative to what I imagine the capabilities of the ASIs to be.)

I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor.

I don't know how to establish anything post-ASI "with any reasonable degree of rigor" but the above is an argument I recently thought of, which seems convincing, although of course you may disagree. (If someone has expressed this or a similar argument previously, please let me know.)

  1. Why? Perhaps we'd do it out of moral uncertainty, thinking maybe we owe something to our former selves, but future people probably won't think this.
  2. Currently our utility is roughly log in money, partly because we spend money on instrumental goals and there's diminishing returns due to limited opportunities being used up. This won't be true of future utilitarians spending resources on their terminal values. So "one in hundred million fraction" of resources is a much bigger deal to them than to us.
Wei Dai2214

I have a slightly different take, which is that we can't commit to doing this scheme even if we want to, because I don't see what we can do today that would warrant the term "commitment", i.e., would be binding on our post-singularity selves.

In either case (we can't or don't commit), the argument in the OP loses a lot of its force, because we don't know whether post-singularity humans will decide to do this kind scheme or not.

Wei Dai7-3

So the commitment I want to make is just my current self yelling at my future self, that "no, you should still bail us out even if 'you' don't have a skin in the game anymore". I expect myself to keep my word that I would probably honor a commitment like that, even if trading away 10 planets for 1 no longer seems like that good of an idea.

This doesn't make much sense to me. Why would your future self "honor a commitment like that", if the "commitment" is essentially just one agent yelling at another agent to do something the second agent doesn't want to do? I don't understand what moral (or physical or motivational) force your "commitment" is supposed to have on your future self, if your future self does not already think doing the simulation trade is a good idea.

I mean imagine if as a kid you made a "commitment" in the form of yelling at your future self that if you ever had lots of money you'd spend it all on comic books and action figures. Now as an adult you'd just ignore it, right?

Load More