For more discussion on this paper, see the comment thread for it on OpenReview: https://openreview.net/forum?id=BZ5a1r-kVsf
I just realized today that there are comments on that page. There are about a dozen so far.
(Note: I jotted down these thoughts from my phone and while on a plane as I finished reading LeCun's paper. They are rough and underdeveloped, but still are points I find interesting and that I think might spark some good discussion.)
A modular architecture like this would have interpretability benefits and alignment implications. The separate "hard-wired" cost module is very significant. If this were successfully built it could effectively leapfrog us into best-case interpretability scenario 2.
How will these costs be specified? That looks like a big open question from this paper. It seems like the world model would need to be trained first and interpretable so that cost module can make use of its abstractions.
Is it possible to mix hardwired components with trained models? "Risks from Learned Optimization" mentioned hardcoded optimizers as a possibility for preventing emergence of mesa optimizers. "Tool using AI" research may be relevant too, where here interestingly the cost module would be the tool.
The ultimate goal of the agent is minimize the intrinsic cost over the long run.
The agent is an optimizer by design!
If a solution is developed for specifying the cost module, then the agent may be inner aligned.
But many naive ways of specifying the cost module (e.g. make human pain a cost) seem to lead straight to catastrophe via outer alignment failures and instrumental convergence for a sufficiently advanced system.
Could this architecture be leveraged to implement a cost module that's more likely to be outer aligned like imitative HCH or some other myopic objective?
Are the trainable modules (critic, world model, actor) subject to mesa-optimization risk?
I'm quite surprised by the lack of discussion on this paper. This is probably one of the most significant paper on AGI I've seen as it outlines a concrete, practical path to its implementation by one of the most important researcher in the field.
There is not a lot of discussion about the paper here on LessWrong yet, but there are a dozen or so comments about it on OpenReview: https://openreview.net/forum?id=BZ5a1r-kVsf
I agree. I shared the post on an a couple AI safety Slack and Discord channels just now to try and get it more visibility.
It probably would have gotten more engagement if someone else (e.g. Gwern posted it). I'm a low karma/unpopular account, so few of my posts get seen unless people go looking for new posts.
I rarely read new posts (I read particular posts on things I'm interested in and have alerts for some posters). So, it's not that surprising, I guess? I wouldn't have read this post myself had someone analogous to me posted it.
Maybe there's a way to increase discoverability of posts from low karma/unpopular users?
Currently, there isn't much modeling the video data on imagenet. Static image models have been far more advanced (over years of supervised learning, like from captcha) than what we have right now for video classification.
Still, our best ML systems are still very far from matching human reliability in real-world tasks such as driving, even after being fed with enormous amounts of supervisory data from human experts, after going through millions of reinforcement learning trials in virtual environments, and after engineers have hardwired hundreds of behaviors into them.
I imagine a large part of the world model described in the paper will come from video classifications.
If you look some examples of video classification problems, they are labeling entity along with an action that's associated with the addition temporal data associated with videos. I'm not sure if this additional temporal information directly from the model will be used in this paper's world model, or maybe this world model will most likely use classification models from static images and will store action information separate from the entities that are classified in the videos.
Of course, this example alone just shows how much work there needs to be done to even begin the preliminary phase of developing the models described in this paper, but I think the autonomous aspects presented in the paper can be independently investigated before the maturity of video data classification models.
The autonomous parts seem to be largely based on human brain emulation, at least on the architectural level. The current state of the art ML autonomy has been unsupervised learning from deep neural nets like DeepMind. That's like using single/ensemble model for one set of tasks. This paper is more like using ensemble learning for different types of tasks that model different parts of the human brain, and their overall interactions resembles the autonomous aspect and the context such autonomy operates in.
There's a recent project from Google AI called Pathways (blog post, paper) which also aspires to produce more general AI. From the blog post:
Pathways will enable a single AI system to generalize across thousands or millions of tasks, to understand different types of data, and to do so with remarkable efficiency – advancing us from the era of single-purpose models that merely recognize patterns to one in which more general-purpose intelligent systems reflect a deeper understanding of our world and can adapt to new needs.
(Thanks to Michael Chen for making me aware of this.)
My model of Eliezer winces when a proposal for AGI design is published rather than kept secret. Part of me does too.
One upshot though is it gives AI safety researchers and proponents a more tangible case to examine. Architecture-specific risks can be identified, and central concerns like inner alignment can be evaluated against the proposed architecture and (assuming they still apply) be made more concrete and convincing.
I'm still reading the LeCun paper (currently on page 9). One thing it's reminding me of so far is Steve Byrnes' writing on brain-like AGI (and related safety considerations): https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8
My impression is that he's trying to do GOFAI with fully differentiable neural networks. I'm also not sure he's describing a GAI — I think he's starting by aiming for parity with the capabilities of a typical mammal, not human-level, and that's why he uses self-driving cars as an example.
Personally I think a move towards GOFAI-like ideas is a good intuition, but that insisting on keeping things fully differentiable is too constraining. I believe that at some level, we are going to need to move away from doing everything with gradient descent, and use something more like approximate Bayesianism, or at least RL.
I also think he's underestimating the influence of genetics in mammalian mental capabilities. He talks about the step of babies learning that the world is 3D not 2D — I think it's very plausible that adaptations for processing sensory data from a 3D rather than 2D world are already encoded in our genome, brain structure, and physiology in a many places.
If this is going to be a GAI architecture, then I think he's massively underthinking alignment.
great to see. as important as safety research is, if we don't get capabilities in time, most of humanity is going to be lost. long-termism requires aiming to preserve today's archeology, or the long-term future we hoped to preserve will be lost anyway. safety is also critical; differential acceleration of safe capabilities is important, so let's use this to try to contribute to capable safety.
I just wish lecun saw that facebook is catastrophically misaligned.
Abstract
Meta's Chief AI Scientist Yann Lecun lays out his vision for what an architecture for generally intelligent agents might look like.