Matthew Barnett

Someone who is interested in learning and doing good.

My Twitter: https://twitter.com/MatthewJBar

My Substack: https://matthewbarnett.substack.com/

Sequences

Daily Insights

Wikitag Contributions

History of AI Risk Thought

(+5/-5)

Economics

(+1232)

Comments

Sorted by

Newest

Capital Ownership Will Not Prevent Human Disempowerment

Matthew Barnett2mo40

But anyway, it sometimes seems to me that you often advocate a morality regarding AI relations that doesn't benefit anyone who currently exists, or, the coalition that you are a part of. This seems like a mistake. Or worse.

I dispute this, since I've argued for the practical benefits of giving AIs legal autonomy, which I think would likely benefit existing humans. Relatedly, I've also talked about how I think hastening the arrival AI could benefit people who currently exist. Indeed, that's one of the best arguments for accelerating AI. The argument is that, by ensuring AI arrives sooner, we can accelerate the pace of medical progress, among other useful technologies. This could ensure that currently-existing old people who would otherwise die without AI will be saved and live a longer and healthier life than the alternative.

(Of course, this must be weighed against concerns about AI safety. I am not claiming that there is no tradeoff between AI safety and acceleration. Rather, my point is that, despite the risks, accelerating AI could still be the preferable choice.)

However, I do think there is an important distinction here to make between the following groups:

The set of all existing humans
The human species itself, including all potential genetic descendants of existing humans

Insofar as I have loyalty towards a group, I have much more loyalty towards (1) than (2). It's possible you think that I should see myself as belonging to the coalition comprised of (2) rather than (1), but I don't see a strong argument for that position.

To the extent it makes sense to think of morality as arising from game theoretic considerations, there doesn’t appear to be much advantage for me in identifying with the coalition of all potential human descendants (group 2) rather than with the coalition of currently existing humans plus potential future AIs (group 1 + AIs) . If we are willing to extend our coalition to include potential future beings, then I would seem to have even stronger practical reasons to align myself with a coalition that includes future AI systems. This is because future AIs will likely be far more powerful than any potential biological human descendants.

I want to clarify, however, that I don't tend to think of morality as arising from game theoretic considerations. Rather, I mostly think of morality as simply an expression of my personal preferences about the world.

Capital Ownership Will Not Prevent Human Disempowerment

Matthew Barnett2mo73

Are you suggesting that I should base my morality on whether I'll be rewarded for adhering to it? That just sounds like selfishness disguised as impersonal ethics.

To be clear, I do have some selfish/non-impartial preferences. I care about my own life and happiness, and the happiness of my friends and family. But I also have some altruistic preferences, and my commentary on AI tends to reflect that.

meemi's Shortform

Matthew Barnett3mo83

I'm not completely sure, since I was not personally involved in the relevant negotiations for FrontierMath. However, what I can say is that Tamay already indicated that Epoch should have tried harder to obtain different contract terms that enabled us to have greater transparency. I don't think it makes sense for him to say that unless he believes it was feasible to have achieved a different outcome.

Also, I want to clarify that this new benchmark is separate from FrontierMath and we are under different constraints with regards to it.

meemi's Shortform

Matthew Barnett3mo*46

I can't make any confident claims or promises right now, but my best guess is that we will make sure this new benchmark stays entirely private and under Epoch's control, to the extent this is feasible for us. However, I want to emphasize that by saying this, I'm not making a public commitment on behalf of Epoch.

meemi's Shortform

Matthew Barnett3mo124

Having hopefully learned from our mistakes regarding FrontierMath, we intend to be more transparent to collaborators for this new benchmark. However, at this stage of development, the benchmark has not reached a point where any major public disclosures are necessary.

We probably won't just play status games with each other after AGI

Matthew Barnett3mo62

I suppose that means it might be worth writing an additional post that more directly responds to the idea that AGI will end material scarcity. I agree that thesis deserves a specific refutation.

We probably won't just play status games with each other after AGI

Matthew Barnett3mo42

This seems less like a normal friendship and more like a superstimulus simulating the appearance of a friendship for entertainment value. It seems reasonable enough to characterize it as non-authentic.

I assume some people people will end up wanting to interact with a mere superstimulus; however, other people will value authenticity and variety in their friendships and social experiences. This comes down to human preferences, which will shape the type of AIs we end up training.

The conclusion that nearly all AI-human friendships will seem inauthentic thus seems unwarranted. Unless the superstimulus is irresistible, then it won't be the only type of relationship people will have.

Since most people already express distaste at non-authentic friendships with AIs, I assume there will be a lot of demand for AI companies to train higher quality AIs that are not superficial and pliable in the way you suggest. These AIs would not merely appear independent but would literally be independent in the same functional sense that humans are, if indeed that's what consumers demand.

This can be compared to addictive drugs and video games, which are popular, but not universally viewed as worthwhile pursuits. In fact, many people purposely avoid trying certain drugs to avoid getting addicted: they'd rather try to enjoy what they see as richer and more meaningful experiences from life instead.

Human takeover might be worse than AI takeover

Matthew Barnett3mo20

They might be about getting unconditional love from someone or they might be about having everyone cowering in fear, but they're pretty consistently about wanting something from other humans (or wanting to prove something to other humans, or wanting other humans to have certain feelings or emotions, etc)

I agree with this view, however, I am not sure it rescues the position that a human who succeeds in taking over the world would not pursue actions that are extinction-level bad.

If such a person has absolute power in the way assumed here, their strategies to get what they want would not be limited to nice and cooperative strategies with the rest of the world. As you point out, an alternative strategy could be to cause everyone else to cower in fear or submission, which is indeed a common strategy for dictators.

and my guess is that getting simulations of those same things from AI wouldn't satisfy those desires.

My prediction is that people will find AIs to be just as satisfying to be peers with compared to humans. In fact, I'd go further: for almost any axis you can mention, you could train an AI that is superior to humans along that axis, who would make a more interesting and more compelling peer.

I think you are downplaying AI by calling what it offers a mere "simulation": there's nothing inherently less real about a mind made of silicon compared to a mind made of flesh. AIs can be funnier, more attractive, more adventurous, harder working, more social, friendlier, more courageous, and smarter than humans, and all of these traits serve as sufficient motives for a uncaring dictator to replace their human peers with AIs.

Human takeover might be worse than AI takeover

Matthew Barnett3mo*2315

But we certainly have evidence about what humans want and strive to achieve, eg Maslow's hierarchy and other taxonomies of human desire. My sense, although I can't point to specific evidence offhand, is that once their physical needs are met, humans are reliably largely motivated by wanting other humans to feel and behave in certain ways toward them.

I think the idea that most people's "basic needs" can ever be definitively "met", after which they transition to altruistic pursuits, is more or less a myth. In reality, in modern, wealthy countries where people have more than enough to meet their physical needs—like sufficient calories to sustain themselves—most people still strive for far more material wealth than necessary to satisfy their basic needs, and they do not often share much of their wealth with strangers.

(To clarify: I understand that you may not have meant that humans are altruistic, just that they want others to "feel and behave in certain ways toward them". But if this desire is a purely selfish one, then I would be very fearful of how it would be satisfied by a human with absolute power.)

The notion that there’s a line marking the point at which human needs are fully met oversimplifies the situation. Instead, what we observe is a constantly shifting and rising standard of what is considered "basic" or essential. For example, 200 years ago, it would have been laughable to describe air conditioning in a hot climate as a basic necessity; today, this view is standard. Similarly, someone like Jeff Bezos (though he might not say it out loud) might see having staff clean his mansion as a "basic need", whereas the vast majority of people who are much poorer than him would view this expense as frivolous.

One common model to make sense of this behavior is that humans get logarithmic utility in wealth. In this model, extra resources have sharply diminishing returns to utility, but humans are nonetheless insatiable: the derivative of utility with respect to wealth is always positive, at every level of wealth.

Now, of course, it's clear that many humans are also altruistic to some degree, but:

Among people who would be likely to try to take over the world, I expect them to be more like brutal dictators than like the median person. This makes me much more worried about what a human would do if they tried and succeeded in taking over the world.
Common apparent examples of altruism are often explained easily as mere costless signaling, i.e. cheap talk, rather than genuine altruism. Actively sacrificing one's material well-being for the sake of others is much less common than merely saying that you care about others. This can be explained by the fact that merely saying that you care about others costs nothing selfishly. Likewise, voting for a candidate who promises to help other people is not significant evidence of altruism, since it selfishly costs almost nothing for an individual to vote for such a politician.

Humanity is a cooperative species, but not necessarily an altruistic one.

Human takeover might be worse than AI takeover

Matthew Barnett3mo118

Almost no competent humans have human extinction as a goal. AI that takes over is clearly not aligned with the intended values, and so has unpredictable goals, which could very well be ones which result in human extinction (especially since many unaligned goals would result in human extinction whether they include that as a terminal goal or not).

I don't think we have good evidence that almost no humans would pursue human extinction if they took over the world, since no human in history has ever achieved that level of power.

Most historical conquerors had pragmatic reasons for getting along with other humans, which explains why they were sometimes nice. For example, Hitler tried to protect his inner circle while pursuing genocide of other groups. However, this behavior was likely because of practical limitations—he still needed the cooperation of others to maintain his power and achieve his goals.

But if there were no constraints on Hitler's behavior, and he could trivially physically replace anyone on Earth with different physical structures that he'd prefer, including replacing them with AIs, then it seems much more plausible to me that he'd kill >95% of humans on Earth. Even if he did keep a large population of humans alive (e.g. racially pure Germans), it seems plausible that they would be dramatically disempowered relative to his own personal power, and so this ultimately doesn't seem much different from human extinction from an ethical point of view.

You might object to this point by saying that even brutal conquerors tend to merely be indifferent to human life, rather than actively wanting others dead. But as true as that may be, the same is true for AI paperclip maximizers, and so it's hard for me to see why we should treat these cases as substantially different.

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments