This is a special post for quick takes by Roman Leventov. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
3 comments, sorted by Click to highlight new comments since:

This is my comment about Pedro Domingos' thinking about AI alignment (in this video interview for the "Machine Learning Street Talk" channel: 

 also sprawling into adjacent themes (in which I push "my agenda", of course):

Thank you Tim for starting consistently bringing up the theme of "alignment" (although I disagree with this framing of the problem, quite similarly to Pedro actually, and for this reason prefer the term "AI safety"; more on this in the end of this comment) in your conversations with AI scientists who are not focused on this. This is very important.

It's sad to see though that both you and Pedro strawman the problem by reducing it to the dated ideas of utility maximisation. The LessWrong community itself considers this idea refuted, as evidenced by this highly upvoted post: "Why The Focus on Expected Utility Maximisers?" Then when you tried to bring the conversation back into more interesting territory when you asked Pedro about instrumental convergence, he probably didn't understand the question and replied to you that "paperclip maximisation is silly". Sure, but instrumental convergence does *not* imply utility maximisation.

Instrumental convergence emerges in any reasonable energy-minimising AI agent architecture, from Active Inference (followers of MLST could also watch two videos with Karl Friston on this channel) to Yann LeCun's autonomous machine intelligence architecture. And it's absolutely an open question: how to ensure that powerful AI doesn't kill people, or shoehorn people into some very limited mode of existence that is *predictable* to AIs and thus allows them to minimise their energy (the energy is lower when agents' predictive capacity is higher; people make the world less predictable). 

Another big mistake is thinking of AIs as "merely tools" (as Pedro said), not agents. Active Inference is the theory that helps to shift perspective here: all "things" (taken as a term from this paper by Karl Friston) in the world are agents that preserve their thingness if it minimises their own variational free energy. See also "Technological Approach to Minds Everywhere" by Michael Levin for a detailed discussion of this totally gradualistic view on agency and intelligence. And in this view, we should already see AI (and any other technology, really) existing today as things that increase our predictability. The extensive use of ChatGPT-like assistance (and extensive use of AI in education) will make the thinking (and, hence, the behaviour) of people much more homogeneous than it is today. And there is a big danger in this trend of cognitive homogenisation. I wrote about some other examples of this dynamic here. Again, this is not limited to AI: for example, it has been noted that Instagram makes people (particularly women) try to look more alike to each other, converging to some common "standard of beauty". Global technology and business kill the heterogeneity of cultures. Ok, this is off-topic now.

Finally, a note on Pedro's view of the world as "a system of interlocked (and sometimes conflicting) utility functions". Again in the Active Inference view, it's more correct to view the world as a system of interlocked (and sometimes conflicting) *behavioural niches* (or *phenotypes*). Mathematically, this could be captured by the expected free energy (EFE) function. Now one can say that this is pure terminology, maybe by "utility functions" Pedro meant exactly these EFE functions, but this is wrong and confusing. The utility is present within EFE and is conceptualised as "pragmatic value" (as opposed to *epistemic value*, the other component of EFE), but utility != EFE as a whole.

Pedro himself sort of gave away the "recipe" when he said that AI "should spend half the time optimising their utility function and half the time learning what that utility function should be, instead of just spending the entire time optimising the utility as AIs do today", on a really crude level this is right, however, EFE is, of course, more nuanced than this and I encourage those who are interested to learn Active Inference directly.

PS. I think "aligning AI with human values" is a wrong framing because human values are not divinely locked, eternal things, but emergent "rules of the game" in the playfield that we currently have. "Morality as Cooperation" theory by Oliver Scott Curry is crucial for understanding this view. This is very reminiscent of the theme brought up by Pedro: humans managed to change the world in which they live for the better, but now their emotional, psychological, and ethical intuitions and heuristics are often misguided, e. g. the "discount rate for the future" is wrong. And AI will change the world once again, profoundly. Trying to take our *current* best ideas about ethics and put them inside AIs is wrong, we should take a holistic approach to *designing the future world*, and thinking through the ethical norms that will be optimal in *that future world*. This is partially a top-down design approach to society, economy, and politics, as also advocated by John Doyle, but, of course, these should be bottom-up components to this project as well: collective sensemaking, dialogue (including with AIs!), social experiments, etc. Unfortunately, it doesn't look to me that the current economic trajectory of cognitive globalisation will give space and time for such deliberate top-down design and bottom-up experiments. Everything seems to be already "decided" by the massive economic incentives.

I believe there is a lot of discussion about singleton AI (what a singleton even is, whether a "community of agents" or a singleton is more likely, or more preferable from the safety perspective, what are the safety implications, etc.), with which I'm basically unfamiliar.

Here, I want to make an observation from the engineering/performance perspective. If there will be a singleton (a single model/algorithm, or a collection of models which we can treat as a singleton) that "controls everything", then at least some of the models close to the top of the hierarchy (where smaller agents/models operate in real-time on the edge, higher level or several layers of hierarchy is/are algorithm(s) that somehow control these edge agents/models) must be relatively slow, order(s) of magnitude slower than those fast edge models that are likely to "think" much faster than humans.

At least one of the higher-level models will be responsible for grasping and controlling slowly unfolding trends. It must be comparatively slow because the input data size will be enormous, and simple incremental summarization techniques won't help to reduce this data size because then the model would fail to recognise deeper patterns, attempts to hide from or game the control by the edge agents, etc.

This idea comes from John Doyle, he writes that robust control must incorporate heterogeneous feedback loops.

I'm not sure that this conclusion, that at least one of the "governing models" (if there will be more than one) will be slow, has any safety implications, though.

I asked GPT-4 to write a list of desiderata for a naturalistic (i.e., scientific) theory of ethics: It made some mistakes but in other regards surprised me with the quality of its philosophy of science and meta-ethics.

The mistake that jumped out for me was “6. Robustness and Flexibility”:

The ethical theory should be robust and flexible, meaning it should be able to accommodate new information and adapt to different contexts and conditions. As our scientific knowledge evolves, the theory should be able to incorporate new insights without losing its coherence or practical applicability.

According to a Popperian criterion for a good scientific theory, falsifiability, GPT-4 should have made the opposite point: the scientific theory of ethics should fit with other theories (of rationality, evolution, consciousness, etc.) in such a way that an update in one of those theories requires a revision in the theory of ethics as well. So, all theories should be mutually “brittle”, which reflects falsifiability.

When I asked GPT-4 to criticise the desiderata for a scientific theory of ethics with Popperian criteria in mind, it found this problem with item 6, as well as several other (albeit smaller) problems that I didn’t think about, and probably wouldn’t come up with even if tried quite hard!

Then, when I asked GPT-4 to rewrite the desiderata considering the criticisms, the flip side of LLM’s self-consistency (Karpathy mentions it in the “State of GPT” talk: has come up: the LLM was reluctant to turn 180 degrees upon recognising its own mistake and instead tried to “amend” all desiderata, preserving their original chich actually made them only worse). The supposed remedy for this problem is the “Tree of Thoughts” LMCA: