My take on Michael Littman on "The HCI of HAI"

Thanks for a great post.

---

One nice point that this post makes (which I suppose was also prominent in the talk, but I can only guess, not being there myself) is that there's a kind of progression we can draw (simplifying a little):

- Human specifies what to do (Classical software)
- Human specifies what to achieve (RL)
- Machine infers a specification of what to achieve (IRL)
- Machine collaborates with human to infer and achieve what the human wants (Assistance games)

Towards the end, this post describes an extrapolation of this trend,

- Machine and human collaboratively figure out what the human even wants to do in the first place.

'Helping humans figure out what they want' is a deep, complex and interesting problem, and I'd love it if more folks were thinking through what solutions to it ought to look like. This seems particularly urgent because human motivations can be affected even by algorithms that were not designed to solve this problem -- for example, think of recommender systems shaping their users' habits -- and which therefore aren't doing what we'd want them to do.

---

Another nice point is the connection between ML algorithm design and HCI. I've been meaning to write something looking at RL as 'technique for communicating and achieving human intent' (and, as a corollary, at AI safety as a kind of human-centred algorithm design), but it seems that I've been scooped by Michael :)

I note that not everyone sees RL from this frame. Some RL researchers view it as a way of understanding intelligence in the abstract, without connecting reward to human values.

---

One thing I'm a little less sure of is the conclusion you draw from your examples of changing intentions. While the examples convince me that the AI ought to have some sophistication about the human's intentions -- for example, being aware that human intentions can change -- it's not obvious that the right move is to 'pop out' further and assume there is something 'bigger' that the human's intentions should be aligned with. Could you elaborate on your vision of what you have in mind there?

[-]Alex Flint5yΩ120

Thank you for the kind words.

for example, being aware that human intentions can change -- it's not obvious that the right move is to 'pop out' further and assume there is something 'bigger' that the human's intentions should be aligned with. Could you elaborate on your vision of what you have in mind there?

Well it would definitely be a mistake to build an AI system that extracts human intentions at some fixed point in time and treats them as fixed forever, yes? So it seems to me that it would be better to build systems predicated on that which is the underlying generator of the trajectory of human intentions. When I say "something bigger that human's intentions should be aligned with" I don't mean "physically bigger", I mean "prior to" or "the cause of".

For example, the work concerning corrigibility is about building AI systems that can be modified later, yes? But why is it good to have AI systems that can be modified later? I would say that the implicit claim underlying corrigibility research is that we believe humans have the capacity to, over time, slowly and with many detours, align our own intentions with that which is actually good. So we believe that if we align AI systems with human intentions in a way that is not locked in, then we will be aligning AI systems with that which is actually good. I'm not claiming this is true, just that this is a premise of corrigibility being good.

Another way of looking at it:

Suppose we look at a whole universe with a single human embedded in it, and we ask: where in this system should we look in order to discover the trajectory of this human's intentions as they evolve through time? We might draw a boundary around the human's left foot and ask: can we discover the trajectory of this human's intentions by examining the configuration of this part of the world? We might draw a boundary around the human's head and ask the same question, and I think some would say in this case that the answer is yes, we can discover the human's intentions by examining the configuration of the head. But this is a remarkably strong claim: it asserts that there is no information crucial to tracing the trajectory of the human's intentions over time in any part of the system outside the head. It we draw a boundary around the entire human then this is still an incredibly strong claim. We have a big physical system with constant interactions between regions inside and outside this boundary. We can see that every part of the physical configuration of the region inside the boundary is affected over time by the physical configuration of the region outside the boundary. It is not impossible that all the information relevant to discovering the trajectory of intentions is inside the boundary, but it is a very strong claim to make. On what basis might we make such a claim?

One way to defend the claim the trajectory of intentions can be discovered by looking just at the head or just at the whole human is to postulate that intentions are fixed. In that case we could extract the human's current intentions from the physical configuration of their head, which does seem highly plausible, and then the trajectory of intentions over time would just be a constant. But I do not think it is plausible that intentions are fixed like this.

[-]Charlie Steiner5yΩ240

Even when talking about how humans shouldn't always be thought of as having some "true goal" that we just need to communicate, it's so difficult to avoid talking in that way :) We naturally phrase alignment as alignment to something - and if it's not humans, well, it must be "alignment with something bigger than humans." We don't have the words to be more specific than "good" or "good for humans," without jumping straight back to aligning outcomes to something specific like "the goals endorsed by humans under reflective equilibrium" or whatever.

We need a good linguistic-science fiction story about a language with no such issues.

[-]Alex Flint5yΩ120

Yes, I agree, it's difficult to find explicit and specific language for what it is that we would really like to align AI systems with. Thank you for the reply. I would love to read such a story!

My take

There was a time when computer scientists in the field of AI looked at their job as being about devising algorithms to solve optimization problems. It’s not unreasonable to work on solving optimization problems -- that’s a valid and important pursuit in the field of computer science -- but if you assume that a human is going to accurately capture their intentions in an optimization problem, and if few people examine how it is that intentions can be communicated from human to machine, then we will end up knowing a lot about how to construct powerful optimization systems while knowing little about how to communicate intentions, which is a dangerous situation. I see Professor Littman’s work as "popping out" a level in the nested problem structure of artificial intelligence:

The work at CHAI concerning assistance games also seems to me to be about "popping out" a level in this nested problem structure, although the specific problem addressed by assistance games is not identical to the one that Michael discussed in his talk.

But, there is further yet for us to "pop out". Underlying Michael’s talk, as well as, so far as I can tell, assistance games as formulated at CHAI, is the view that humans have a fixed intention to communicate. It seems to me that when humans are solving engineering problems, and certainly when humans are solving complex engineering or political or economic problems, it is rare that they hold to a fixed problem formulation without changing it as solutions are devised that reveal unforeseen aspects of the problem. When I worked on visual-inertial navigation systems during my first job after grad school, we started with a problem formulation in which the pixels in each video frame were assumed to have been captured by the camera at the same point in time. But, cell phones use rolling-shutter cameras that capture each row of pixels slightly after the previous row, and it turned out this mattered, so we had to change our problem formulation. But, I’m not just talking about flaws in our explicit assumptions. When I founded a company to build autonomous food-delivery robots, we initially were not clear about the fact that the company was founded, in part, out of a love for robotics. When the three founders became clear about this, we changed some of the ways in which we pitched and hired for the company. It’s not that we overturned some key assumption, but that we discovered an assumption that we didn’t know we were making. And, when I later worked on autonomous cars at a large corporation, we were continuously refining not just the formulation of our intention but our intentions themselves. One might respond that there must have been some higher-level fixed intention such as "make money" or "grow the company" from which our changing intentions were derived, but this was not my experience. At the very highest level -- the level of what I should do with my life and my hopes for life on this planet -- I have again and again overturned not just the way I communicate my intentions but my intentions themselves.

And, this high-level absence of fixed intentions shows up in small-scale day-to-day engineering tasks that we might want AI systems to help us with. Suppose I build a wind monitor for paragliders. I begin by building a few prototypes and deploying them at some paragliding sites, but I discover that they are easily knocked over by high winds. So, I build a more robust frame, but I discover that the cell network on which they communicate has longer outages than I was expecting. So, I change the code to wait longer for outages to pass, but I discover that paragliding hobbyists actually want to know the variability in wind speed, not just the current wind speed. So, I change the UI to present this information but I discover that I do not really want to build a company around this product, I just want a hobby project. So, I scale down my vision for the project and stop pitching it to funders. If I had worked with an AI on this project then would it really be accurate to say that I had some fixed intention all along that I was merely struggling to communicate to the AI? Perhaps we could view it this way, but it seems like a stretch. A different way to view it is that each new prototype I deployed into the world gave me new information that updated my intentions at the deepest level. If I were to collaborate with an AI on this project, then the AI’s job would not be to uncover some fixed intentions deep within me, but to participate fruitfully in the process of aligning both my intentions and that of the AI with something that is bigger than either of us.

In other words, when we look at diagrams showing the evolution of a system towards some goal state such as the ones in the first section of this post, we might try viewing ourselves as being inside the system rather than outside. That is, we might view the diagram as depicting the joint (human + machine) configuration space, and we might say that the role of an AI engineer is to build the kind of machines that have a tendency, when combined with one or more humans, to evolve towards a desirable goal state:

It might be tempting to view the question "how can we build machines that participate fruitfully in the co-discovery of our true purpose on this planet?" as too abstract or philosophical to address in a computer science department. But, remember that there was a time when the question "how can we communicate our intentions to machines?" was seen as outside the scope of core technical computer science. Once we started to unpack this question, we found not only that it was possible to unpack, but that it yielded new concrete models of artificial intelligence and new technical frontiers for graduate students to get stuck into. Perhaps we can go even further in this direction.

This diagram and all future diagrams in this post are my own and are not based on any in Michael’s presentation ↩︎

LESSWRONG
LW

LESSWRONG
LW

59

My take on Michael Littman on "The HCI of HAI"

59

Ω 22

59

Ω 22

My take