To save people the click to Lecun's twitter, I'll gather what pieces I can from Lecun's recent twitter posts:
My claim is that AI alignment will be manageable & less difficult than many have claimed. But until we have a design for Human-Level AI, it's mere speculation. link
He does seem to believe that there is in fact a problem named "AI alignment" that has to be solved for human-level AIs, it's just that he believes it will be much more manageable than AI-notkilleveryoneism people expect.
And from the responses to Julian Togelius' recent blog post:
Julian reminds us:
1. all intelligence is specialized, including human intelligence.
2. being smart in some domains makes you strong in some environments but weak in others.
3. Intelligence does not immediately cause a thing to be able to "take over"
4. Intelligence does not immediately cause an entity yo want to "take over"
5. A very dumb but specialized entity can kill a smarter one, e.g. virus vs human. link
So from points 1 and 2 it looks like he fundamentally disagrees with Eliezer's gesturing at a "core of generality" that develops once you optimize deeply enough. He doesn't expect a system to suddenly "get it" and see the deep regularities that underlie most problems, in fact I think he doesn't think there are such regularities.
Point 3 seems to be about the difficulty of box-escapes and dominating all of humanity. I'm reading that as disagreeing with the general jump that lesswrong types usually do to go from "human-level AI" to "strictly stronger than all of humanity combined".
Point 4 seems like a disagreement with instrumental convergence.
Point 5 is the only really interesting one imo, and I think it's a very good point. Current image recognition models at all ability levels are vulnerable to adversarial attacks which make them unable to recognise images with imperceptible changes as the proper category. I don't know if LLMs have something similar, but I think it's very likely the case, and I wouldn't expect the adversarial attacks to stop working on the most advanced models. So we do have examples of very dumb but targeted methods of defeating very general models, which implies that we might actually have a chance against a superintelligence if we've developed targeted weapons before it gets loose.
Many AI safety discussions today seem as speculative as discussions about airliner safety in 1890. Before we have a basic design & basic demos of AI systems that could credibly reach human-level intelligence, arguments about their risks & safety mechanisms are premature.
So he's not impressed by GPT4, and apparently doesn't think that LLMs in general have a shot at credibly reaching human-level.
Every new technology is developed and deployed the same way: You make a prototype, try it at a small scale, make limited deployment, fix the problems, make it safer, and then deploy it more widely. At that point, governments regulate it and establish safety standards.
He expects AI safety to not be fundamentally different from any other engineering domain, and seems to disagree that we'll only have a single shot at aligning a superintelligence.
After some more scouring of his twitter page, I actually found an argument for pessimism of LLMs that I agree with !!! (hallelujah)
This seems to be related to the "curse of behaviour cloning". Learning to behave correctly only from a dataset of correct behaviour doesn't work, you need examples in your dataset of how to correct wrong behaviour. As an example, if you try to make chatGPT play chess, at some point it will make a nonsensical move, or if you make it do chess analysis it will mistakenly claim that something happened, and thereafter it will condition its output on that wrong claim! It doesn't go "ah yes 2 prompts ... (read more)