Critiquing "What failure looks like"

Objection 1: Historical precedent

I'm pretty sure WFLL1 only applies in the case where AI is "responsible for" some very large fraction of the economy (I imagine >90%), for which we don't really have much of a historical precedent.

And we could ask "But what about love, honor or justice? Will we forget about those unquantifiable things in the era of the algorithm?"

When I imagine WFLL1 that doesn't turn into WFLL2, I usually imagine a world in which all existing humans lead great lives, but don't have much control over the future. On a moment-to-moment basis, that world is better than the current world, but we don't get to influence the future and make use of the cosmic endowment, and so from a total view we have lost >99% of the potential value of the future. Such a world can still include love, honor and justice among the humans who are still around.

On the other hand, the last time I mentioned this among ~6 people, all at least interested in AI safety, not a single other person shared this impression, but still found WFLL1 convincing as an example of a world that was moment-to-moment worse than the current world, but still not WFLL2.

Objection 2: Absence of evidence

AI has a very minor economic impact right now, but even so, I'd argue that the concerns over fairness and bias in AI are evidence of WFLL1, since we can't measure the "fairness" of a classifier.

Objection 3: Why privilege this axis

Mostly that for all the other axes you name, I expect deep learning to eventually become capable of doing those axes. To be fair, I also think that deep learning models will be able to do what we mean rather than what we measure, but that seems like the one most likely to fail. (I do find the dataset axis somewhat convincing, but even there I expect self-supervised learning to make that axis less important.)

[-]Rohin Shah6yΩ330

When I imagine WFLL1 that doesn't turn into WFLL2, I usually imagine a world in which all existing humans lead great lives, but don't have much control over the future. On a moment-to-moment basis, that world is better than the current world, but we don't get to influence the future and make use of the cosmic endowment, and so from a total view we have lost >99% of the potential value of the future.

I was uncertain about this, but it seems this is at least what Paul intended. From here, about WFLL1:

The availability of AI still probably increases humans’ absolute wealth. This is a problem for humans because we care about our fraction of influence over the future, not just our absolute level of wealth over the short term.

[-]Sammy Martin6y30

First off, if we have specific evidence (an answer to Objection 2) then the historical analogy in Objection 1 looks a lot weaker, as any real evidence of WFLL1 arising now would suggest that the historical cases of other algorithms that gave pathological results just aren't representative. I think they aren't representative.

(I think the discontinuity-based arguments largely do make the "this time is different" case, roughly because general intelligence seems clearly game-changing. WFLL2 seems somewhere in between these, and I'm unsure where my beliefs fall on that.)

The key difference 'this time' (before we get anywhere near WFLL2 or AGI), as I see it, is that those early algorithms give recommendations to people that they could implement or avoid, so the 'exploratory phase' where we poked around to find out what they were capable of was pretty much risk-free, while WFLL1 implies that the systems have some degree of autonomy and actually have a chance to do unexpected things without humans realizing straight away. Danzig's linear optimization leading to a catastrophe would have required more carelessness and stupidity than (current or very near-future) deep RL, because deep RL's mistakes are subtler and because it has to be loosed on some environment to achieve results and give us useful information on its behaviour. As for evidence, Stuart Russell thinks that we are already seeing WFLL1 in social media ad algorithms:

“Consider the so-called ‘filter bubble’ of social media. The reinforcement learning algorithm is trying to maximize click throughs. From the view of the human, the purpose of the machine is to maximize clickthroughs. But from the view of the machine, it is changing the state of the world to maximize clicks. It is changing you to make you more predictable. A raving fascist or communist is more predictable and will lap up raving content. The machines can change our mind about our objective function so we are easier to satisfy. Advertisers have done this for decades.” [I argued with him about this feedback loop, and Yann Le Cun says this changed at Facebook a while ago]

“The reinforcement learning algorithm in social media has destroyed the EU, NATO and democracy. And that’s just 50 lines of code.”

I wonder if this hypothesis was in Paul's mind when he wrote the essay. If Russell is right about any of this that suggests that one of the first times we gave deep RL any ability to influence the world it succumbed to a failure scenario almost immediately. That's not a good track record.

[-]Lukas_Gloor5y30

A raving fascist or communist is more predictable and will lap up raving content. The machines can change our mind about our objective function so we are easier to satisfy.

That's a good way to put it!

This might be stretching the analogy, but I feel like there's a similar thing going on with technological evolution of "gadgets" (digital watch, iPod, cell phone). It feels like people's expectations of what a gadget should be able to do for them to make them content continue to grow at a rate so fast that something as simple and obviously beneficial as "battery life" never really receives an improvement. I get that not everyone is bothered by having to charge things all the time (and losing the charger all the time), but how come it's borderline impossible to buy things that don't need to be charged so often? It feels like there's some optimization pressure at work here, and it's not making life more convenient. :)

[-]Lukas_Gloor5y20

For people who share the intuition voiced in the OP, I'm curious if your intuitions change after thinking about the topic of recommender systems and filter bubbles in social media. Especially as portrayed in the documentary "The Social Dilemma" (summarized in this Sam Harris podcast). Does that constitute a historical precedent?

[-]Donald Hobson6yΩ110

Suppose it was easy to create automated companies, and skim a bit off the top. AI algorithms are just better at buisness than any startup founder. Soon some people create these algorithms, give them a few quid in seed capitat and leave them to trade and accumulate money. The algorithms rapidly increase their wealth, and soon own much of the world economy. Humans are removed when the AIs have the power to do so at a profit. This ends in several superintelligences tiling the universe with economium together.

For this to happen, we need

1) Doubling time of fooming AI months to years, to allow many AI's to be in the running.

2) Its fairly easy to set an AI to maximize money.

3) The people that care about complex human values can't effectively make an AI to do that.

4) Any attempts to stamp out all fledgling AIs before they get powerful fails. Helped by anonymous cloud computing.

I don't really buy 1) , but it is fairly plausible, I'm not convinced of 2) either, although it might not be hard to build a mesa optimiser that cares about something sufficiently correlated with money, that humans are beyond caring before any serious deviation from money optimization happens.

If 2) were false, and people who tried to make AI's all got paperclip maximisers, the long run result is just a world filled with paperclips not banknotes. (Although this would make coordinating to destroy the AI's a little easier?) The paperclip maximisers would still try to gain economic influence until they could snap nanotech fingers.

LESSWRONG
LW

LESSWRONG
LW

35

Critiquing "What failure looks like"

35

Ω 17

35

Ω 17

Objection 1: Historical precedent

Objection 2: Absence of evidence

Objection 3: Why privilege this axis (of differential progress)