Yann LeCun is Chief AI Scientist at Meta.

This week, Yann engaged with Eliezer Yudkowsky on Twitter, doubling down on Yann’s position that it is dangerously irresponsible to talk about smarter-than-human AI as an existential threat to humanity.

I haven’t seen anyone else preserve and format the transcript of that discussion, so I am doing that here, then I offer brief commentary.

IPFConline: Top Meta Scientist Yann LeCun Quietly Plotting “Autonomous” #AI Models This is as cool as is it is frightening. (Provides link)

Yann LeCunn: Describing my vision for AI as a “quiet plot” is funny, given that I have published a 60 page paper on it with numerous talks, posts, tweets… The “frightening” part is simply wrong, since the architecture I propose is a way to guarantee that AI systems be steerable and aligned.

Eliezer Yudkowsky: A quick skim of [Yann LeCun’s 60 page paper] showed nothing about alignment. “Alignment” has no hits. On a quick read the architecture doesn’t imply anything obvious about averting instrumental deception, nor SGD finding internal preferences with optima that don’t generalize OOD, etc.

Yann LeCun: To *guarantee* that a system satisfies objectives, you make it optimize those objectives at run time (what I propose). That solves the problem of aligning behavior to objectives. Then you need to align objectives with human values. But that’s not as hard as you make it to be.

EY: Sufficiently intelligent systems, whatever their internal objectives, will do well at optimizing their outer behavior for those. This was never in doubt, at least for me. The entire alignment problem is about aligning internal AI objectives with external human preferences.

Yann: Setting objectives for super-intelligent entities is something humanity has been familiar with since people started associating into groups and laws were made to align their behavior to the common good. Today, it’s called corporate law.

EY: So you’re staking the life of everyone on Earth that:

– Future AIs are as human-friendly on average as the humans making up corporations.

– AIs don’t collude among themselves better than human corporations.

– AIs never go beyond superhuman to supercorporate.

Yann: I’m certainly not staking anyone’s life on anything.

Thankfully, I don’t have that power.

But your idea that getting objective alignment slightly wrong once leads to human extinction (or even significant harm) is just plain wrong.

It’s also dangerous.

Think about consequences.

EY: My objection is not that you’re staking everyone’s life on what you believe – to advocate for a global AI stop is also doing that – but that you are staking everyone’s life on propositions that seem not just uncertain but probably false, and not facing up to that staking. If you think there’s no possible extinction danger from superintelligence no matter how casually the problem is treated or how much you screw up, because of a belief “AIs are no more capable than corporations”, state that premise clearly and that it must bear the weight of Earth.

YL: Stop it, Eliezer. Your scaremongering is already hurting some people. You’ll be sorry if it starts getting people killed.

EY: If you’re pushing AI along a path that continues past human and to superhuman intelligence, it’s just silly to claim that you’re not risking anyone’s life. And sillier yet to claim there are no debate-worthy assumptions underlying the claim that you’re not risking anyone’s life.

YL: You know, you can’t just go around using ridiculous arguments to accuse people of anticipated genocide and hoping there will be no consequence that you will regret. It’s dangerous. People become clinically depressed reading your crap. Others may become violent.

J Fern: When asked on Lex’s podcast to give advice to high school students, Elezier’s response was “don’t expect to live long.” The most inhuman response form someone championing themselves as a hero of humanity. He doesn’t get anything other than his own blend of rationality.

Yann LeCunn: A high-school student actually wrote to me saying that he got into a deep depression after reading prophecies of AI-fueled apocalypse.

Eliezer Yudkowsky: As much as ‘tu quoque’ would be a valid reply, if I claimed that your conduct could ever justify or mitigate my conduct, I can’t help but observe that you’re a lead employee atFacebook. You’ve contributed to a *lot* of cases of teenage depression, if that’s considered an issue of knockdown importance.

Of course that can’t justify anything I do. So my actual substantive reply is that if there’s an asteroid falling, “Talking about that falling asteroid will depress high-school students” isn’t a good reason not to talk about the asteroid or even – on my own morality – to *hide the truth* from high-school students.

The crux is, again, whether or not, as a simple question of simple fact, our present course leads to a superhuman AI killing everybody. And with respect to that factual question, observing that “hearing about this issue has depressed some high-school students” is not a valid argument from a probabilistic standpoint; “the state of argument over whether AI will kill everyone is deeply depressing” is not *less* likely, in worlds where AI kills everyone, than in worlds where AI is not.

What’s more likely in worlds where AI doesn’t kill everyone? Somebody having an actual plan for that which stands up to five minutes of criticism.

Yann LeCunn: Scaremongering about an asteroid that doesn’t actually exist (even if you think it does) is going to depress people for no reason.

Running a free service that 3 billion people find useful to connect with each other is a Good Thing, even if there are negative side effects that must be mitigated and which are being mitigated [note: I don’t work on this at Meta].

It’s like cars: the solution to reducing accidents is to make cars and roads safer, not banning cars nor just deciding that car manufacturers are evil. Solutions can never be perfect, but they put us in a better place on the risk-benefit trade-off curve.

The risk-benefit trade-off curves for AI are no different, contrary to your mistaken assumption that even the smallest mistake would spells doom on humanity as a whole.

Eliezer Yudkowsky: It seems then, Yann LeCun, that you concede that the point is not whether the end of the world is depressing anyone, the point is just:

*Is* the world ending?

You’ve stated that, although superintelligence is on the way – you apparently concede this point? – corporate law shows that we can all collectively manage superhuman entities, no worries. It’s such a knockdown argument, on your apparent view, that there’s “no reason” to depress any teenagers with worries about it.

I, for one, think it’s strange to consider human corporations as a sort of thing that allegedly scales up to the point of being really, actually, *way smarter* than humans: way more charismatic, way more strategic, with a far deeper understanding of people, making faster progress on basic science questions with lower sample complexity, far less susceptible to invalid arguments, much less likely to make stupid damn mistakes than any human being; and all the other hallmarks that I’d expect to go with truly superhuman cognitive excellence.

Sure, all of the Meta employees with spears could militarily overrun a lone defender with one spear. When it comes to scaling more cognitive tasks, Kasparov won the game of Kasparov vs. The World, where a massively parallel group of humans led by four grandmasters tried and failed to play chess good enough for Kasparov. Humans really scale very poorly, IMO. It’s not clear to me that, say, all Meta employees put together, collectively exceed John von Neumann for brilliance in every dimension.

I similarly expect AIs past a certain point of superhumanity to be much better-coordinated than humans; see for example the paper “Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic” (https://arxiv.org/abs/1401.5577). If sufficiently smart minds in general, and AIs with legible source code in particular, can achieve vastly better outcomes on coordination problems via prediction of each others’ decision processes (eg: you can predict I’ll cooperate on the Prisoner’s Dilemma iff I predict you’ll cooperate), then a world full of superhuman AGIs is one where humanity should worry that AIs will all cooperate with each other, and not with us, because we cannot exhibit to one another our code, or build an agreed-upon cognitive agent to arbitrate between us.

I don’t think corporate law is robust to a world where all the AI corporations act together in perfect unison with each other, but not with dumber humans who are naturally unable to participate in their coordination process.

Heck, just an AI thinking 10,000 times faster than a human (one day per nine seconds or so) makes it kinda hard to see how human regulators would stay in control using the same sort of tactics that human regulators use to stay in control of human corporations.

I finally worry that, in real life, the existence of aggregate human groups, has worked out as well for humanity as it has, because humans do care somewhat for other humans, or are relatively easy to train into doing that. I then expect it to be much harder to get an AI to care the same way, on anything remotely resembling the current DL paradigm – including search-based optimization for an objective, if that objective is itself being trained via DL.

Why? Because, for example, hominid evolution falsifies any purported general law along the lines of “hill-climbing optimization for a loss function, to the point where that produces general intelligence, produces robust generalization of the intuitive ‘meaning’ of the loss function even as the system optimized becomes more intelligent”. Humans were optimized purely for inclusive genetic fitness, and we ended up with no built-in internal psychological concept of what that *is*. When we got smarter, smart enough that condoms were a new option that didn’t exist in the ancestral environment / training distribution, we started using condoms. Gradient descent isn’t natural selection, but this case example in point does falsify that there’s any law of nature saying in general, “When you do hill-climbing optimization to the point of spitting out an intelligent mind, it ends up aligned with the loss function you tried to train it on, in a way that generalizes well outside the training distribution as the system becomes smarter.”

It seems that *any* of the possibilities:

1. Superhuman AGIs are less human-friendly than ‘superhuman’ corporations composed of humans, and end up not wanting to do the equivalent of tipping at restaurants they’ll never visit again;

2. Superhuman AGIs can operate on a much faster timescale than human corporations and human regulators;

3. Superhuman AGIs will coordinate with each other far better than human corporations have ever managed to conspire; or

4. Superhuman AGIs end up qualitatively truly *smarter* than humans, in a way that makes utter mock of the analogy to human corporations…

…if they are reasonably in play, would each individually suffice to mean that a teenager should not stop worrying, upon being told “Don’t worry, we’ll regulate superintelligences just like we regulate corporations.”

So can you explain again why the teenager should be unworried about trends in AI intelligence heading toward “human-level” general intelligence, and then blowing right through, and then continuing on further?

Can you definitely shoot down *even one* of those concerns 1-4, that seem like individually sufficient reasons to worry? I think all of 1-4 are actually in reality true, not just maybes. If you can shoot down any one of them, to the satisfaction of others concerned with these topics, it would advance the state of debate in this field. Pick whichever one seems weakest to you; start there.

If Yann responds further, I will update this post.

Yann’s Position Here on Corporate Law is Obvious Nonsense

Yann seems to literally be saying ‘we can create superintelligent beings and it will pose zero risk to humanity because we have corporate law. Saying otherwise is dangerous and wrong and needs to stop.’

This is Obvious Nonsense. Yann’s whole line here is Obvious Nonsense. You can agree or disagree with a variety of things Eliezer says here. You can think other points are more relevant or important as potential responses, or think he’s being an arrogant jerk or what not. Certainly you can think Eliezer is overconfident in his position, perhaps even unreasonably so, and you can hold out hope, even quite a lot of hope, that things will turn out well for various reasons.

And yet you can and must still recognize that Yann is acting in bad faith and failing to make valid arguments.

One can even put some amount of hope in the idea of some form of rule of law being an equilibrium that holds for AIs. Surely no person would think this is such a strong solution that we have nothing to worry about.

The very idea that we are going to create entities more capable than humans, of unknown composition, and this is nothing to worry about is patently absurd. That this is nothing to worry about because corporate law is doubly so.

It’s also worth noting, as Daniel Eth points out, that our first encounters with large scale corporations under corporate law… did not go so great? One of them, The East India Company, kind of got armies and took over a large percentage of the planet’s population?

Daniel Eth: I’m actually concerned that a much more competent version of the East India Company would have created a permanent dystopia for all of humanity, so I’m not persuaded by “we’ve governed corporations fine without running into X-risk, so there’s no risk from superintelligent AI.”

Geoffrey Miller: Interesting thought experiment. But, IMHO, the only ‘permanent dystopia’ would be human extinction. Anything else we can escape, eventually, one way or another.

Daniel Eth: “AGI turns the universe into data centers for computing pi, except for the Milky Way, which it turns into a zoo of humans spending 18 hours a day in brutal work, satisfying the slight amount it cares about preserving ‘the poetic human struggle’” seems like a permanent dystopia.

Here is it worth noting that Geoffrey Miller’s claim that ‘anything else we can escape, eventually, one way or another’ is similarly Obvious Nonsense. Why would one think that humans could ‘escape’ from their AI overlords, over any time frame, no matter the scenario and its other details, if humans were to lose control of the future but still survive?

Because a human is writing the plot? Because the AIs would inevitably start a human rights campaign? Because they’d ignore us scrappy humans thinking we were no threat until we found the time to strike?

In real life, once you lose control of the future to things smarter than you are that don’t want you flourishing, you do not get that control back. You do not escape.

It’s crazy where people will find ways to pretend to have hope.

That does not mean that there is no hope. There is hope!

Perhaps everything will turn out great, if we work to make that happen.

There is gigantic upside, if developments are handled sufficiently well.

If that happens, it will not be because we pushed forward thinking there were no problems to be solved, or with no plans to solve them. It will be because (for some value of ‘we’) we realized these problems were super hard and these dangers were super deadly, and some collective we rose to the challenge and won the day.

In terms of the important question of how epistemics works, Yann seems to literally claim that the way to avoid believing false things is to ‘get your research past your advisor and getting your publications to survive peer review’:

An essential step to becoming a scientist is to learn methods and protocols to avoid deluding yourself into believing false things.

You learn that by doing a PhD and getting your research past your advisor and getting your publications to survive peer review.

Also, in response to Eliezer’s clarification here:

Eliezer Yudkowsky: [Foom is] possible but hardly inevitable. It becomes moderately more likely as people call it absurd and fail to take precautions against it, like checking for sudden drops in the loss function and suspending training. Mostly, though, this is not a necessary postulate of a doom story.

Yann LeCunn: “Believe in the god I just invented. By refusing to believe, you risk spending eternity in hell.”

Conclusion and a Precommitment

I wouldn’t be posting this if Yann LeCunn wasn’t Chief AI Scientist for Meta.

Despite his position, I believe this conversation is sufficient documentation of his positions, such that further coverage would not be productive. So unless circumstances change substantially – either Yann changes his views substantially, or Yann’s actions become far more important – I commit to not covering him further.

New to LessWrong?

New Comment
51 comments, sorted by Click to highlight new comments since: Today at 9:23 AM

Yann’s position that the creation of smarter-than-human artificial intelligence poses zero threat to humanity.

The linked tweet doesn't actually make this claim, or even imply it. Is Yann on the record saying "zero threat" somewhere else? Or even claiming that the level of risk is <0.1%? Failing that, is there somewhere he has implied that the risk is very small?

It seems like his position is more like: this problem looks likely to be manageable in a variety of ways, so let's push ahead and try to manage it. The probability of doom today is very low, so we can reconsider later if things look bleak.

He does say "the 'hard takeoff scenario' is utterly impossible." I'm a bit sympathetic to this, since Eliezer's description of super hard takeoff feels crazy to me as well. That said: (i) "utterly impossible" is too strong even for super hard takeoffs, which seem very unlikely but possible, (ii) based on his rhetoric about adaptation Yann seems to be implicitly rejecting even months-long takeoffs, which I think have >10% probability depending on how you define them.

I believe I have seen him make multiple statements on Twitter over months that express this point of view, and I see his statements herein as reinforcing that, but in the interests of this not being distracting to the main point I am editing this line.

Also including an additional exchange from yesterday he had with Elon Musk. 

Mods, please reimport. 

EDIT: Also adding his response to Max Tegmark from yesterday, and EY's response to that, at the bottom, but raising the bar for further inclusions substantially. 

Includes this quote: But regardless, until we have a semi credible design, we are discussing the sex of angels. The worst that can happen is that we won't figure out a good way to make them safe and we won't deploy them.

Which, logically, says zero risk, since the worst that can happen is non-deployment?

(I still have no idea where the button is that's supposed to let me switch to a WYSIWYG editor)

I agree that "the worst that can happen is..." suggests an unreasonably low estimate of risk, and technically implies implies either zero threat or zero risk of human error.

That said, I think it's worth distinguishing the story "we will be able to see the threat and we will stop," from "there is no threat." The first story makes it clearer that there is actually broad support for measurement to detect risk and institutional structures that can slow down if risk is large.

It also feels like the key disagreement isn't about corporate law or arguments for risk, it's about how much warning we get in advance and how reliably institutions like Meta would stop building AI if they don't figure out a good way to make AI safe. I think both are interesting, but the "how much warning" disagreement is probably more important for technical experts to debate---my rough sense is that the broader intellectual world already isn't really on Yann's page when he says "we'd definitely stop if this was unsafe, nothing to worry about."

this was posted after your comment, but i think this is close enough:

@ylecun

And the idea that intelligent systems will inevitably want to take over, dominate humans, or just destroy humanity through negligence is preposterous.
They would have to be specifically designed to do so.
Whereas we will obviously design them to not do so.
 

I'm most convinced by the second sentence:

They would have to be specifically designed to do so.

Which definitely seems to be dismissing the possibility of alignment failures.

My guess would be that he would back off of this claim if pushed on it explicitly, but I'm not sure. And it is at any rate indicative of his attitude.

He specifically told me when I asked this question that his views were the same as Geoff Hinton and Scott Aaronson and neither of them hold the view that smarter than human AI poses zero threat to humanity.

From a Facebook discussion with Scott Aaronson yesterday:

Yann: I think neither Yoshua nor Geoff believe that AI is going kill us all with any significant probability.

Scott: Well, Yoshua signed the pause letter, and wrote an accompanying statement about what he sees as the risk to civilization (I agree that there are many civilizational risks short of extinction). In his words: “No one, not even the leading AI experts, including those who developed these giant AI models, can be absolutely certain that such powerful tools now or in the future cannot be used in ways that would be catastrophic to society.”

Geoff said in a widely-shared recent video that it’s “not inconceivable” that AI will wipe out humanity, and didn’t offer any reassurances about it being vanishingly unlikely.

https://yoshuabengio.org/2023/04/05/slowing-down-development-of-ai-systems-passing-the-turing-test/

https://twitter.com/JMannhart/status/1641764742137016320

Yann: Scott Aaronson he is worried about catastrophic disruptions of the political, economic, and environmental systems. I don't want to speak for him, but I doubt he worries about a Yuddite-style uncontrollable "hard takeoff"

On the one hand, "Yuddite" is (kinda rude but) really rather clever.

On the other hand, the actual Luddites were concerned about technological unemployment which makes "Yuddite" a potentially misleading term, given that there's something of a rift between the "concerned about ways in which AI might lead to people's lives being worse within a world that's basically like the one we have now" and "concerned about the possibility that AI will turn the world so completely upside down that there's no room for us in it any more" camps and Yudkowsky is very firmly in the latter camp.

On the third hand, the Luddites made a prediction about a bad (for them) outcome, and were absolutely correct. They were against automatic looms because they thought the autolooms would replace their artisan product with lower quality goods and also worsen their wages and working conditions. They were right: https://www.smithsonianmag.com/history/what-the-luddites-really-fought-against-264412/

he is worried about catastrophic disruptions of the political, economic, and environmental systems

Ok, but how is this any different in practice? Or preventable via "corporate law"? It feels to me like people make too much of a distinction between slow and fast take offs scenarios, as if somewhat, if humans appear to be in the loop a bit more, this makes the problem less bad or less AI-related.

Essentially, if your mode of failure follows almost naturally from introducing AI system in current society and basic economic incentives, to the point that you can't really look at any part of the process and identify anyone maliciously and intentionally setting it up to end the world, yet it does end the world, then it's an AI problem. It may be a weird, slow, cyborg-like amalgamation of AI and human society that caused the catastrophe instead of a singular agentic AI taking everything over quickly, but the AI is still the main driver, and the only way to avoid the problem is to make AI extremely robust not just to intentional bad use but also to unintentional bad incentives feedback loops, essentially smart and moral enough to stop its own users and creators when they don't know any better. Or alternatively, to just not make the AI at all.

Honestly, given what Facebook's recommender systems have already caused, it's disheartening that the leader of AI research at Meta doesn't get something like this.

His statements seem better read as evasions than arguments It seems pretty clear that Lecun is functioning as a troll in this exchange and elsewhere. He does not have a thought out position on AI risk. I don't find it contradictory that someone could be really good at thinking about some stuff, and simply prefer not to think about some other stuff.

Yes, he's trolling: https://twitter.com/ylecun/status/1650544506964520972

That doesn't read us trolling to me? 80% sure he means it literally.

A quick google search gives a few options for the definition, and this qualifies according to all of them, from what I can tell. The fact that he thinks the comment is true doesn't change that.

Trolling definition: 1. the act of leaving an insulting message on the internet in order to annoy someone

Trolling is when someone posts or comments online to 'bait' people, which means deliberately provoking an argument or emotional reaction.

Online, a troll is someone who enters a communication channel, such as a comment thread, solely to cause trouble. Trolls often use comment threads to cyberbully...

What is trolling? ... A troll is Internet slang for a person who intentionally tries to instigate conflict, hostility, or arguments in an online social community.

Not knowing anything about LeCun other than this twitter conversation, I've formed a tentative hypothesis: LeCun is not putting a lot of effort into the debate: if you resort to accusing your opponent of scaring teenagers, it is usually because you feel you do not need to try very hard and you feel that almost any argument would do the job. Not trying very hard and not putting much thought behind one's words is in my experience common behavior when one perceives oneself as being higher in status than one's debate opponents. If a total ban on training any AI larger than GPT-4 starts getting more political traction, I expect that LeCun would increase his estimate of the status of his debating opponents and would increase the amount of mental effort he puts into arguing his position, which would in turn make it more likely that he would come around to the realization that AI research is dangerous.

I've seen people here express the intuition that it is important for us not to alienate people like LeCun by contradicting them too strongly and that it is important not to alienate them by calling on governments to impose a ban (rather than restricting our efforts to persuading them to stop voluntarily). Well, I weakly hold the opposite intuition for the reasons I just gave.

You're thinking that LeCun would be more likely to believe AI x-risk arguments if he put more mental effort into this debate. Based on the cognitive biases literature, motivated reasoning and confirmation bias in particular, I think it's more likely that he'd just find more clever ways to not believe x-risk arguments if he put more effort in.

I don't know what that means for strategy, but I think it's important to figure that out.

Let's make some assumptions about Mark Zuckerberg:

  1. Zuckerberg has above-average intelligence.
  2. He has a deep interest in new technologies.
  3. He is invested in a positive future for humanity.
  4. He has some understanding of the risks associated with the development of superintelligent AI systems.

Given these assumptions, it's reasonable to expect Zuckerberg to be concerned about AI safety and its potential impact on society.

Now the question that it's been bugging me since some weeks after reading LeCun's arguments:

Could it be that Zuckerberg is not informed about his subordinate views?

If so, someone should really make pressure for this to happen and probably even replace LeCun as Chief Scientist at Meta AI.

4 seems like an unreasonable assumption. From Zuck’s perspective his chief AI scientist, whom he trusts, shares milder views of AI risks than some unknown people with no credentials (again from his perspective).

Stupidity I would get, let alone well-reasoned disagreement. But bad faith confuses me. However selfish, don't these people want to live too? I really don't understand, Professor Quirrel.

You could pick corporations as an example of coordinated humans, but also e.g. Genghis Khan's hordes. And they did actually take over. If you do want to pick corporations, look e.g. at East India companies that also took over parts of the world.

I have my own specific disagreements with EY's foom argument, but it was difficult to find much anything in Yann's arguments, tone, or strategy here to agree with, and EY presents the more cogent case.

Still, I'll critique one of EY's points here:

Sure, all of the Meta employees with spears could militarily overrun a lone defender with one spear. When it comes to scaling more cognitive tasks, Kasparov won the game of Kasparov vs. The World, where a massively parallel group of humans led by four grandmasters tried and failed to play chess good enough for Kasparov. Humans really scale very poorly, IMO.

Most real world economic tasks seem far closer to human army scaling rather than single chess game scaling. In the chess game example there is a very limited action space and only one move per time step. The real world isn't like that - more humans can just take more actions per time step, so the scaling is much more linear like armies for many economic tasks like building electric vehicles or software empires. Sure there are some hard math/engineering challenges that have bottlenecks, but not really comparable to the chess example.

Sure, but I would say there's depth and breadth dimensions here. A corporation can do some things N times faster than a single human, but "coming up with loopholes in corporate law to exploit" isn't one of them, and that's the closest analogy to an AGI being deceptive. The question of scaling is that the best case scenario is you take the sum, or even enhance the sum of individual efforts, but in worst case scenario you just take the maximum, which gives you "best human effort" performance but can't go beyond.

When asked on Lex’s podcast to give advice to high school students, Elezier’s response was “don’t expect to live long.”

Not to belittle the perceived risk if one believes in 90% chance of doom in the next decade, but even if one has a 1% chance of an indefinite lifespan, the expected lifespan of teenagers now is much higher than previous generations. 

LeCun handwaving AGIs being as manageable as corporations makes two big mistakes already:

  1. underestimates all the ways in which you can improve intelligence over corporations

  2. overestimates how well we've aligned corporations

Without even going as far back as the EIC (which by the way shows well how corporate law was something we came up with only in time and through many and bloody failures of alignment), when corporations like Shell purposefully hid their knowledge of climate change risks, isn't that corporate misalignment resulting in a nigh existential risk for humanity?

It's honestly a bad joke to hear such words from a man employed at the top of a corporation whose own dumb Engagement Maximizer behaviour has gotten it sued over being accomplice in genocide. That's our sterling example of alignment? The man doesn't even seem to know the recent history of his own employer.

People say Eliezer is bad at communicating and sure, he does get pointlessly technical over some relatively simple concepts there IMO. But LeCunn's position is so bad I can't help but feel like it's a typical example of Sinclair's Law: "it is difficult to get a man to understand something, when his salary depends on his not understanding it".

LeCunn's position is so bad I can't help but feel like it's a typical example of Sinclair's Law: "it is difficult to get a man to understand something, when his salary depends on his not understanding it".

I would assume that LeCunn assumes that someone from Facebook HR is reading every tweet he posts, and that his tweets are written at least partly for that audience.  That's an even stronger scenario than Sinclair's description, which talks about what the man believes in the privacy of his own mind, as opposed to what he says in public, in writing, under his real name.  In this circumstance... there are some people who would say whatever they believed even if it was bad for their company, but I'd guess it's <10% of people.  I don't think I would do so, although that might be partly because pseudonymous communication works fine.

If he was so gagged and couldn't speak his real mind, he could simply not speak at all. I don't think Meta gives him detailed instructions about how much time he has to spend on Twitter arguing against and ridiculing people worried about AI safety to such an extent. This feels like a personal chip on the shoulder for him, from someone who's seen his increasingly dismissive takes on the topic during the last weeks.

Yeah, that's true.  Still, in the process of such arguing, he could run into an individual point that he couldn't think of a good argument against.  At that moment, I could see him being tempted to say "Hmm, all right, that's a fair point", then thinking about HR asking him to explain why he posted that, and instead resorting to "Your fearmongering is hurting people".  (I think the name is "argument from consequences".)

I agree with the analysis of the ideas overall. I think however, AI x-risk does have some issue regarding communications. First of all, I think it's very unlikely that Yann will respond to the wall of text. Even though he is responding, I imagine him more to be on the level of your college professor. He will not reply to a very detailed post. In general, I think that AI x-risk should aim to explain a bit more, rather than to take the stance that all the "But What if We Just..." has already been addressed. It may have been, but this is not the way to getting them to open up rationally to it.

Regarding Yann's ideas, I have not looked at them in full. However, they sound like what I imagine an AI capabilities researcher would try to make as their AI alignment "baseline" model:

  • Hardcoding the reward will obviously not work.
  • Therefore, the reward function must be learned.
  • If an AI is trained on reward to generate a policy, whatever the AI learned to optimize can easily go off the rails once it gets out of distribution, or learn to deceive the verifiers.
  • Therefore, why not have the reward function explicitly in the loop with the world model & action chooser?
  • ChatGPT/GPT-4 seems to have a good understanding of ethics. It probably will not like it if you told it a plan was to willingly deceive human operators. As a reward model, one might think it might be robust enough.

They may think that this is enough to work. It might be worth explaining in a concise way why this baseline does not work. Surely we must have a resource on this. Even without a link (people don't always like to follow links from those they disagree with), it might help to have some concise explanation. 

Honestly, what are the failure modes? Here is what I think:

  • The reward model may have pathologies the action chooser could find.
  • The action chooser may find a way to withhold information from the reward model.
  • The reward model evaluates what, exactly? Text of plans? Text of plans != the entire activations (& weights) of the model...
  • ChatGPT/GPT-4 seems to have a good understanding of ethics. It probably will not like it if you told it a plan was to willingly deceive human operators. As a reward model, one might think it might be robust enough.

This understanding has so far proven to be very shallow and does not actually control behavior, and is therefore insufficient. Users regularly get around it by asking the AI to pretend to be evil, or to write a story, and so on. It is demonstrably not robust. It is also demonstrably very easy for minds (current-AI, human, dog, corporate, or otherwise) to know things and not act on them, even when those actions control rewards. 

If I try to imagine LeCun not being aware of this already, I find it hard to get my brain out of Upton Sinclair "It is difficult to get a man to understand something, when his salary depends on his not understanding it," territory.

(Lots of people are getting this wrong and it seems polite to point out: it's LeCun, not LeCunn.)

An essential step to becoming a scientist is to learn methods and protocols to avoid deluding yourself into believing false things.

Yep, good epistemics is kind of the point of a lot of this "science" business. ("Rationality" too, of course.) I agree with this point.

You learn that by doing a PhD and getting your research past your advisor and getting your publications to survive peer review.

I mean, that's probably one way to build rationality skill. But Science Isn't Enough. And a lot of "scientists" by these lights still seem deeply irrational by ours. Yudkowsky has thought deeply about this topic.

In terms of the important question of how epistemics works, Yann seems to literally claim that the way to avoid believing false things is to ‘get your research past your advisor and getting your publications to survive peer review’

"The" way, or "a" way? From the wording, with a more charitable reading, I don't think he literally said that was the exclusive method, just the one that he used (and implied that Yudkowsky didn't? Not sure of context.) "Arguments are soldiers" here?

If Yann responds further, I will update this post.

...

I commit to not covering him further.

These seem to be in conflict.

Zvi will update the post if Yann responds further in the thread with Eliezer but there will be no new Zvi posts centered on Yann

Actually a bit stronger than that. My intention is that I will continue to update this post if Yann posts in-thread within the next few days, but to otherwise not cover Yann going forward unless it involves actual things happening, or his views updating. And please do call me out on this if I slip up, which I might.

What? Nothing is in conflict you just took quotes out of context. The full sentence where Zvi commits is:

So unless circumstances change substantially – either Yann changes his views substantially, or Yann’s actions become far more important – I commit to not covering him further.

To me responding further does not necesarily imply a change in position or importance, so I still think the sentences are somewhat contradictory in hypothetical futures where Yann responds but does not substantially change his position or become more important.

I think the resolution is that Zvi will update only this conversation with further responses, but will not cover other conversations (unless one of the mentioned conditions is met).

If he does comment further, it's likely to be more of the same.

Scaremongering about an asteroid

Minor typo - I think you accidentally pasted this comment twice by LeCunn

Also, the link in "In terms of the important question of how epistemics works, Yann seems to literally claim" is incorrect and just links to the general Twitter account.

Humans really scale very poorly, IMO. It’s not clear to me that, say, all Meta employees put together, collectively exceed John von Neumann for brilliance in every dimension.

This is a really interesting point. Kudos to Eliezer for raising it.

As much as ‘tu quoque’ would be a valid reply, if I claimed that your conduct could ever justify or mitigate my conduct, I can’t help but observe that you’re a lead employee at… Facebook. You’ve contributed to a *lot* of cases of teenage depression, if that’s considered an issue of knockdown importance.

Ouch.

[-][anonymous]1y50

Bit stronger than that. LeCunn is lead of Meta's AI research. Beyond the shiny fancy LLMs, Meta's most important AI product is its Facebook recommender system. And via blind engagement maximisation, that engine has caused and keeps causing a lot of problems.

[-][anonymous]1y10

Being responsible for something isn't equivalent to having personally done it. If you're in charge of something, your responsibility means also that it's your job to know about the consequences of the things you are in charge of, and if they cause harm it's your job to direct your subordinates to fix that, and if you fail to do that... the buck stops with you. Because it was your job, not to do the thing, but to make sure that the thing didn't go astray.

Now, I'm actually not sure whether LeCunn is professionally (and legally) responsible for the consequences of Meta's botched AI products, also because I don't know how long he's been in charge of them. But even if he wasn't, they are at least close enough to his position that he ought to know about them, and if he doesn't, that seriously puts in doubt his competence (since how can the Director of AI research at Meta know less about the failure modes of Meta's AI than me, a random internet user?). And if he knows, he knows perfectly well of a glaring counterexample to his "corporations are aligned" theory. No, corporations are not fully aligned, they keep going astray in all sorts of way as soon as their reward function doesn't match humanity's, and the regulators at best play whack-a-mole with them because none of their mistakes have been existential.

Yet.

That we know of.

So, the argument really comes apart at the seams. If tomorrow a corporation was given a chance to push a button that has a 50% chance of multiplying by ten their revenue and 50% chance of destroying the Earth, they would push it, and if it destroys the Earth, new corporate law could never come in time to fix that.

I think for a typical Meta employee your argument makes sense, there are lots of employees bearing little to no blame. But once you have "Chief" anything in your title, that gets to be a much harder argument to support, because everything you do in that kind of role helps steer the broader direction of the company. LeCun is in charge of getting Meta to make better AI. Why is Meta making AI at all? Because the company believes it will increase revenue. How? In part by increasing how much users engage with its products, the same products that EY is pointing out have significant mental health consequences for a subset of those users. 

I think folks are being less charitable than they could be to LeCun. LeCun’s views and arguments about AI risk are strongly counterintuitive to many people in this community who are steeped in alignment theory. His arguments are also more cursory and less fleshed-out than I would ideally like. But he’s a Turing Award winner, for God’s sake. He’s a co-inventor of deep learning. 
 

LeCun has a rough sketch of a roadmap to AGI, which includes a rough sketch of a plan for alignment and safety. Ivan Vendrov writes:

Broadly, it seems that in a world where LeCun's architecture becomes dominant, useful AI safety work looks more analogous to the kind of work that goes on now to make self-driving cars safe. It's not difficult to understand the individual components of a self-driving car or to debug them in isolation, but emergent interactions between the components and a diverse range of environments require massive and ongoing investments in testing and redundancy.

For this reason, LeCun thinks of AI safety as an engineering problem analogous to aviation safety or automotive safety. Conversely, disagreeing with LeCun on AI safety would seem to imply a different view of the technical path to developing AGI. 

In the classic movie "Bridge on the River Kwai", a new senior Allied officer (Colonel Nicholson) arrives in a Japanese POW camp and tries to restore morale among his dispirited men. This includes ordering them to stop malingering/sabotage and to build a proper bridge for the Japanese, one that will stand as a tribute to the British Army's capabilities for centuries - despite the fact that this bridge will harm the British Army and their allies.

While I always saw the benefit to prisoner morale in working toward a clear goal, I took a long time to see the Colonel Nicholson character as realistic. Yann LeCunn reads like a man who would immediately identify with Nicholson and view the colonel as a role model. LeCunn has chosen to build a proper bridge to the AI future.

And now I have that theme song stuck in my head.

Why would one think that humans could ‘escape’ from their AI overlords, over any time frame, no matter the scenario and its other details, if humans were to lose control of the future but still survive? <...> In real life, once you lose control of the future to things smarter than you are that don’t want you flourishing, you do not get that control back. You do not escape.

We could use Rhesus macaque vs humans as a model. 

Although the world is dominated by humans for millennia, Rhesus macaque still exist and is not even threatened with extinction, thanks to its high adaptability (unlike many other non-human primates). 

Humans are clearly able of exterminating macaques, but are not bothering to do so, as humans have more important things to do. 

Macaques contain some useful atoms, but there are better sources of useful atoms. 

Macaques are adaptive enough to survive in many environments created by humans.

If humans become extinct (e.g. by killing each other), Rhesus macaque will continue to flourish, unless humans take much of the biosphere with them. 

This gives some substantiated hope. 

Taking it line by line:

--Rhesus macaque is definitely threatened by extinction, because the humans are doing various important things that might kill them by accident. For example, runaway climate change, possible nuclear winter, and of course, AGI. Similarly, once unaligned AIs take over the world and humans are no longer in control, they might later do various things that kill humans as a side effect, such as get into some major nuclear war with each other, or cover everything in solar panels, or disassemble the earth for materials.

--At first, humans will be a precious resource for AIs, because they won't have a self-sustaining robot economy, nor giant armies of robots. Then, humans will be a poor source of useful atoms, because there will be better sources of useful atoms e.g. the dirt, the biosphere, the oceans, the Moon. Then, humans (or more likely, whatever remains of their corpses) will be a good source of useful atoms again, because all the better sources in the solar system will have been exhausted.

--Humans are not adaptive enough to survive in many environments created by unaligned AIs, unless said unaligned AIs specifically care about humans and take care to design the environment that way. (Think "the Earth is being disassembled for materials which are used to construct space probes and giant supercomputers orbiting the Sun) Probably. There's a lot more to say on this which I'm happy to get into if you think it might change your mind.

--If AIs become extinct, e.g. by killing each other, humans will continue to flourish, unless AIs take much of the biosphere or humansphere with them (e.g. by nuclear war, or biological war)

[+][comment deleted]1y20