Matthew Barnett

Someone who is interested in learning and doing good.

My Twitter: https://twitter.com/MatthewJBar

My Substack: https://matthewbarnett.substack.com/

Sequences

Daily Insights

Wiki Contributions

History of AI Risk Thought

(+5/-5)

Economics

(+1232)

Comments

Losing Faith In Contrarianism

Matthew Barnett15h20

But if the message that people received was "medicine doesn't work" (and it appears that many people did), then Scott's writings should be an useful update, independent of whether Hanson's-writings-as-intended was actually trying to deliver that message.

The statement I was replying to was: "I’d bet at upwards of 9 to 1 odds that Hanson is wrong about it."

If one is incorrect about what Hanson believes about medicine, then that fact is relevant to whether you should make such a bet (or more generally whether you should have such a strong belief about him being "wrong"). This is independent of whatever message people received from reading Hanson.

AI Regulation is Unsafe

Matthew Barnett1d64

non-consensually killing vast amounts of people and their children for some chance of improving one's own longevity.

I think this misrepresents the scenario since AGI presumably won't just improve my own longevity: it will presumably improve most people's longevity (assuming it does that at all), in addition to all the other benefits that AGI would provide the world. Also, both potential decisions are "unilateral": if some group forcibly stops AGI development, they're causing everyone else to non-consensually die from old age, by assumption.

I understand you have the intuition that there's an important asymmetry here. However, even if that's true, I think it's important to strive to be accurate when describing the moral choice here.

AI Regulation is Unsafe

Matthew Barnett2d40

And quantitatively I think it would improve overall chances of AGI going well by double-digit percentage points at least.

Makes sense. By comparison, my own unconditional estimate of p(doom) is not much higher than 10%, and so it's hard on my view for any intervention to have a double-digit percentage point effect.

The crude mortality rate before the pandemic was about 0.7%. If we use that number to estimate the direct cost of a 1-year pause, then this is the bar that we'd need to clear for a pause to be justified. I find it plausible that this bar could be met, but at the same time, I am also pretty skeptical of the mechanisms various people have given for how a pause will help with AI safety.

AI Regulation is Unsafe

Matthew Barnett2d53

I don't think staging a civil war is generally a good way of saving lives. Moreover, ordinary aging has about a 100% chance of "killing literally everyone" prematurely, so it's unclear to me what moral distinction you're trying to make in your comment. It's possible you think that:

Death from aging is not as bad as death from AI because aging is natural whereas AI is artificial
Death from aging is not as bad as death from AI because human civilization would continue if everyone dies from aging, whereas it would not continue if AI kills everyone

In the case of (1) I'm not sure I share the intuition. Being forced to die from old age seems, if anything, worse than being forced to die from AI, since it is long and drawn-out, and presumably more painful than death from AI. You might also think about this dilemma in terms of act vs. omission, but I am not convinced there's a clear asymmetry here.

In the case of (2), whether AI takeover is worse depends on how bad you think an "AI civilization" would be in the absence of humans. I recently wrote a post about some reasons to think that it wouldn't be much worse than a human civilization.

In any case, I think this is simply a comparison between "everyone literally dies" vs. "everyone might literally die but in a different way". So I don't think it's clear that pushing for one over the other makes someone a "Dark Lord", in the morally relevant sense, compared to the alternative.

AI Regulation is Unsafe

Matthew Barnett2d20

So, it sounds like you'd be in favor of a 1-year pause or slowdown then, but not a 10-year?

That depends on the benefits that we get from a 1-year pause. I'd be open to the policy, but I'm not currently convinced that the benefits would be large enough to justify the costs.

Also, I object to your side-swipe at longtermism

I didn't side-swipe at longtermism, or try to dunk on it. I think longtermism is a decent philosophy, and I consider myself a longtermist in the dictionary sense as you quoted. I was simply talking about people who aren't "fully committed" to the (strong) version of the philosophy.

Losing Faith In Contrarianism

Matthew Barnett2d20

The next part of the sentence you quote says, "but it got eaten by a substack glitch". I'm guessing he's referring to a different piece from Sam Atis that is apparently no longer available?

Losing Faith In Contrarianism

Matthew Barnett3d130

Similarly, now that I’ve read through Scott’s response to Hanson on medicine, I’d bet at upwards of 9 to 1 odds that Hanson is wrong about it.

I'm broadly sympathetic to this post. I think a lot of people adjacent to the LessWrong cluster tend to believe contrarian claims on the basis of flimsy evidence. That said, I am fairly confident that Scott Alexander misrepresented Robin Hanson's position on medicine in that post, as I pointed out in my comment here. So, I'd urge you not to update too far on this particular question, at least until Hanson has responded to the post. (However, I do think Robin Hanson has stated his views on this topic in a confusing way that reliably leads to misinterpretation.)

The first future and the best future

Matthew Barnett3d96

Do you think it's worth slowing down other technologies to ensure that we push for care in how we use them over the benefit of speed? It's true that the stakes are lower for other technologies, but that mostly just means that both the upside potential and the downside risks are lower compared to AI, which doesn't by itself imply that we should go quickly.

AI Regulation is Unsafe

Matthew Barnett4d3-20

Until recently, people with P(doom) of, say, 10%, have been natural allies of people with P(doom) of >80%. But the regulation that the latter group thinks is sufficient to avoid xrisk with high confidence has, on my worldview, a significant chance of either causing x-risk from totalitarianism, or else causing x-risk via governments being worse at alignment than companies would have been.

I agree. Moreover, a p(doom) of 10% vs. 80% means a lot for people like me who think the current generation of humans have substantial moral value (i.e., people who aren't fully committed to longtermism).

In the p(doom)=10% case, burdensome regulations that appreciably delay AI, or greatly reduce the impact of AI, have a large chance of causing the premature deaths of people who currently exist, including our family and friends. This is really bad if you care significantly about people who currently exist.

This consideration is sometimes neglected in these discussions, perhaps because it's seen as a form of selfish partiality that we should toss aside. But in my opinion, morality is allowed to be partial. Morality is whatever we want it to be. And I don't have a strong urge to sacrifice everyone I know and love for the sake of slightly increasing (in my view) the chance of the human species being preserved.

(The additional considerations of potential totalitarianism, public choice arguments, and the fact that I think unaligned AIs will probably have moral value, make me quite averse to very strong regulatory controls on AI.)

The argument for near-term human disempowerment through AI

Matthew Barnett12d15-6

I read most of this paper, albeit somewhat quickly and skipped a few sections. I appreciate how clear the writing is, and I want to encourage more AI risk proponents to write papers like this to explain their views. That said, I largely disagree with the conclusion and several lines of reasoning within it.

Here are some of my thoughts (although these not my only disagreements):

I think the definition of "disempowerment" is vague in a way that fails to distinguish between e.g. (1) "less than 1% of world income goes to humans, but they have a high absolute standard of living and are generally treated well" vs. (2) "humans are in a state of perpetual impoverishment and oppression due to AIs and generally the future sucks for them".
- These are distinct scenarios with very different implications (under my values) for whether what happened is bad or good
- I think (1) is OK and I think it's more-or-less the default outcome from AI, whereas I think (2) would be a lot worse and I find it less likely.
- By not distinguishing between these things, the paper allows for a motte-and-bailey in which they show that one (generic) range of outcomes could occur, and then imply that it is bad, even though both good and bad scenarios are consistent with the set of outcomes they've demonstrated
I think this quote is pretty confused and seems to rely partially on a misunderstanding of what people mean when they say that AGI cognition might be messy: "Second, even if human psychology is messy, this does not mean that an AGI’s psychology would be messy. It seems like current deep learning methodology embodies a distinction between final and instrumental goals. For instance, in standard versions of reinforcement learning, the model learns to optimize an externally specified reward function as best as possible. It seems like this reward function determines the model’s final goal. During training, the model learns to seek out things which are instrumentally relevant to this final goal. Hence, there appears to be a strict distinction between the final goal (specified by the reward function) and instrumental goals."
- Generally speaking, reinforcement learning shouldn't be seen as directly encoding goals into models and thereby making them agentic, but should instead be seen as a process used to select models for how well they get reward during training.
- Consequently, there's no strong reason why reinforcement learning should create entities that have a clean psychological goal structure that is sharply different from and less messy than human goal structures. c.f. Models don't "get reward".
- But I agree that future AIs could be agentic if we purposely intend for them to be agentic, including via extensive reinforcement learning.
I think this quote potentially indicates a flawed mental model of AI development underneath: "Moreover, I want to note that instrumental convergence is not the only route to AI capable of disempowering humanity which tries to disempower humanity. If sufficiently many actors will be able to build AI capable of disempowering humanity, including, e.g. small groups of ordinary citizens, then some will intentionally unleash AI trying to disempower humanity."
- I think this type of scenario is very implausible because AIs will very likely be developed by large entities with lots of resources (such as big corporations and governments) rather than e.g. small groups of ordinary citizens.
- By the time small groups of less powerful citizens have the power to develop very smart AIs, we will likely already be in a world filled with very smart AIs. In this case, either human disempowerment already happened, or we're in a world in which it's much harder to disempower humans, because there are lots of AIs who have an active stake in ensuring this does not occur.
- The last point is very important, and follows from a more general principle that the "ability necessary to take over the world" is not constant, but instead increases with the technology level. For example, if you invent a gun, that does not make you very powerful, because other people could have guns too. Likewise, simply being very smart does not make you have any overwhelming hard power against the rest of the world if the rest of the world is filled with very smart agents.
I think this quote overstates the value specification problem and ignores evidence from LLMs that this type of thing is not very hard: "There are two kinds of challenges in aligning AI. First, one needs to specify the goals the model should pursue. Second, one needs to ensure that the model robustly pursues those goals.^Footnote12 The first challenge has been termed the ‘king Midas problem’ (Russell 2019). In a nutshell, human goals are complex, multi-faceted, diverse, wide-ranging, and potentially inconsistent. This is why it is exceedingly hard, if not impossible, to explicitly specify everything humans tend to care about."
- I don't think we need to "explicitly specify everything humans tend to care about" into a utility function. Instead, we can have AIs learn human values by having them trained on human data.
- This is already what current LLMs do. If you ask GPT-4 to execute a sequence of instructions, it rarely misinterprets you in a way that would imply improper goal specification. The more likely outcome is that GPT-4 will simply not be able to fulfill your request, not that it will execute a mis-specified sequence of instructions that satisfies the literal specification of what you said at the expense of what you intended.
- Note that I'm not saying that GPT-4 merely understands what you're requesting. I am saying that GPT-4 generally literally executes your instructions how you intended (an action, not a belief).
I think the argument about how instrumental convergence implies disempowerment proves too much. Lots of agents in the world don't try to take over the world despite having goals that are not identical to the goals of other agents. If your claim is that powerful agents will naturally try to take over the world unless they are exactly aligned with the goals of the rest of the world, then I don't think this claim is consistent with the existence of powerful sub-groups of humanity (e.g. large countries) that do not try to take over the world despite being very powerful.
- You might reason, "Powerful sub-groups of humans are aligned with each other, which is why they don't try to take over the world". But I dispute this hypothesis:
  - First of all, I don't think that humans are exactly aligned with the goals of other humans. I think that's just empirically false in almost every way you could measure the truth of the claim. At best, humans are generally partially (not totally) aligned with random strangers -- which could also easily be true of future AIs that are pretrained on our data.
  - Second of all, I think the most common view in social science is that powerful groups don't constantly go to war and predate on smaller groups because there are large costs to war, rather than because of moral constraints. Attempting takeover is generally risky and not usually better in expectation than trying to trade, negotiate and compromise and accumulate resources lawfully (e.g. a violent world takeover would involves a lot of pointless destruction of resources). This is distinct from the idea that human groups don't try to take over the world because they're aligned with human values (which I also think is too vague to evaluate meaningfully, if that's what you'd claim).
  - You can't easily counter by saying "no human group has the ability to take over the world" because it is trivial to carve up subsets of humanity that control >99% of wealth and resources, which could in principle take control of the entire world if they became unified and decided to achieve that goal. These arbitrary subsets of humanity don't attempt world takeover largely because they are not coordinated as a group, but AIs could similarly not be unified and coordinated around a such a goal too.

LESSWRONG
LW

Sequences

Posts

Wiki Contributions

Comments