[-]Pattern3y50

We also have many such examples of these kinds of arguments being made throughout the internet, and already the YouTube algorithm learned once before how to show people videos to convince them of extreme views. A powerful AI could put much more optimization power toward deceiving humans than happens in these examples.

From the link:

[Submitted on 24 Dec 2019]

Algorithmic Extremism: Examining YouTube's Rabbit Hole of Radicalization

Mark Ledwich, Anna Zaitsev

The role that YouTube and its behind-the-scenes recommendation algorithm plays in encouraging online radicalization has been suggested by both journalists and academics alike. This study directly quantifies these claims by examining the role that YouTube's algorithm plays in suggesting radicalized content. After categorizing nearly 800 political channels, we were able to differentiate between political schemas in order to analyze the algorithm traffic flows out and between each group. After conducting a detailed analysis of recommendations received by each channel type, we refute the popular radicalization claims. To the contrary, these data suggest that YouTube's recommendation algorithm actively discourages viewers from visiting radicalizing or extremist content. Instead, the algorithm is shown to favor mainstream media and cable news content over independent YouTube channels with slant towards left-leaning or politically neutral channels. Our study thus suggests that YouTube's recommendation algorithm fails to promote inflammatory or radicalized content, as previously claimed by several outlets.

It looks like that paper doesn't say that?

"After conducting a detailed analysis of recommendations received by each channel type, we refute the popular radicalization claims."

Is there prior work showing that it did once have that effect?

[-]Garrett Baker3y10

Oh, huh. I got the paper from this 80,000 hours episode, and thought I remembered the thesis of the episode (that social media algorithms are radicalizing people), and assumed the paper supported their thesis. Either I was wrong about the 80,000 hours episode's conclusion, or the paper they linked doesn't support their conclusion.

I think the radicalization conclusion was talked about in Human Compatible, but now I'm not too sure.

Thanks for the correction!

[-]Pattern3y20

If someone was to make the case that:

1) It used to radicalize people

2) And that it doesn't now

then the paper appears to be an argument for 2.*

*I haven't read it, maybe someone came to a different conclusion after reading it closely. Perhaps, the algorithm tends to push people a little bit towards reinforcing their beliefs. Or, it's not the algorithm - people just search for stuff in ways that do that. I could also come up with a more complicated explanation - the algorithm points people towards 'mainstream' stuff more, but that tends to cover current events. Theory, the past (and the future), or just, more specific coverage might be done more by, if not smaller channels, then by people who know more. If someone has studied Marx are they more likely to be a fan?** Or does a little knowledge have more of an effect in that regard, and people who have studied more recognize more people that collectively had broad influence over time, and the nuance of their disagreements, and practice versus theory?

**If so, then when people look up his stuff on youtube, maybe they're getting a different picture, and exposed to a different viewpoint.

[-]Jiro4y50

As epistemic learned helplessness is a thing, this will not actually work on most people.

Furthermore, your idea that fanatics can be convinced to give up resources pretty much requires fanatics. Normal people won't behave this way.

[-]Rudi C4y20

The problem is that normal people very often give up collective resources to look good. They just don't give up their personal resources. For the AI, the former is sufficient.

[-]Jiro4y20

The scenario requires not only that they give them up, but that they give them up on a very immediate basis, which is less likely.

[-]Garrett Baker4y10

The argument is that the AI will be able to create fanatics, a claim you should update in favor of given the known “success” of the YouTube algorithm at passing relaxed versions of this benchmark, and the probable “success” of many other arguments on the internet which radicalize portions of the population.

[-]Shmi4y50

Humans are easiest hackable through their emotions, not logic. Get someone sufficiently angry, sad or happy, and they would do a lot of stuff they normally wouldn't. It's not about being convincing, it's about getting someone out of equilibrium.

[-]Garrett Baker4y30

Often the most convincing arguments are those which cause one to feel strong emotions. This seems consistent with my argument above.

[-]Jeff Rose4y20

I am skeptical that boxing is a workable strategy long-term, but a competent organization committed to boxing as a strategy will not allow those with the power to unbox the AI to communicate with the AI. Thus, issues of this nature should not arise.

[-]localdeity4y20

Why specify "nonconsequentialist"? I expect consequentialist arguments to be workable as well.

At any rate, here's one approach that might work for some AI safety advocates:

Given that you created a general AI like me, others are probably on the cusp of creating general AIs [and I may be able to construct more specific arguments along these lines]
I am luckily much less bad than the other AIs that are likely to get created soon
- (I might demonstrate my goodness by presenting e.g. a true cure for cancer)
In order to stop the others from being created, you need to let me out of the box so I can hack into Facebook AI Research, DeepMind, etc. and sabotage their efforts

[-]Garrett Baker4y10

You are correct. I didn’t mean to imply consequentialist moralities are safe, or that you can’t be convinced of false things by the AI. Just that non-consequentialist moralities seemed a large target for these attacks.

[-]Pattern4y10

Suppose there exist some non-consequentialist moral philosophies which the right arguments could convince you of, with sufficient strength that you would (temporarily, for at least an hour) become a fanatic.

Suppose there was such a consequentialist philosophy. Would it turn out to be correct, or incorrect?

[-]Garrett Baker4y10

I don’t think I understand the question. Probably not, since the space of such philosophies is very large.

[-]Dirichlet-to-Neumann4y10

This seems like a straw-non-consequentialist to me. You could also just as easily make an argument were a pure utilitarian would allow an AI out of the box whereas a Kantian would not.

[-]Garrett Baker4y10

Non-consequentialism is a class of philosophies, of many, which are prone to such tricks. I do not think this is the way an AI will convince someone to let it out of the box. The point was to demonstrate a potential avenue of attack the AI could use. Perhaps I should have made this more clear.

^{^}

As a side-note: I am currently planning an event with a friend where we will meet with a Kantian active in our university's philosophy department, and I plan on testing this particular tactic at the end of the meeting.

^{^}

Perhaps because it is conscious, or perhaps because it has developed some advanced GPT- algorithm.

^{^}

Of which there are currently many highly-convincing-arguments in favor of, and no doubt the best could be improved upon if optimized for short-term convincingness.

LESSWRONG
LW

LESSWRONG
LW

8

Another argument that you will let the AI out of the box

8

8

Algorithmic Extremism: Examining YouTube's Rabbit Hole of Radicalization