I would hereby like to advertise a document Systems of Services as a Paradigm for AI Alignment . I hope that it can serve as a starting point for investigating AI alignment through the lens of systems of AI services. An alternative framing for the text is that it is a collection of pieces I have found particularly helpful for having productive conversations about the topic. In this post, I briefly recap the motivation behind the document and outline its contents. I also argue for why service-systems are a useful paradigm and mention my best guesses for promising future work.
As part of a recent collaboration, we wanted to look over technical problems in Comprehensive AI Services (CAIS), then pick one of them and study it in more detail. However, it soon turned out that a lot more conceptual work was needed before we can formalize any of our thoughts in a way that would be useful. We did study Drexler's Reframing Superintelligence prior to embarking on this project, and the text indeed provides a lot of useful insights into the nature of advanced systems of AI services. However, it puts less emphasis on how to use the service framework for technical research. So from our starting position, we weren't sure how to define the basic concepts, how to model problems in CAIS, and what these problems even are.
We also decided to focus on a slightly different framing, mostly in two ways. First, Reframing Superintelligence primarily discusses comprehensive AI services, which are expected to be on par with a artificial general intelligence. Since we can reasonably expect AI X-risk to be associated with AI that is radically transformative, we wanted to deal with a broader class of systems. Second, it is likely that even very advanced AI services will be deeply entangled with services that are "implemented on humans". (Especially since we can view human organizations and institutions as clusters of services.) From this reason, the document studies hybrid systems of services that consist of both AI- and human services. Even if our interest is primarily in AI services, studying the whole system (including its non-AI parts) should make it easier to fully understand their impact on the world.
Contents of the document
While we did eventually make some progress on more specific problems, I thought it might be useful to write down a separate "introductory" text summarizing the things-that-seem-essential about the framework of service systems. Here is the document's slightly annotated table of contents:
- Introduction (motivating the approach; skippable if you have read this post)
- Basic Concepts (explaining what we mean by services, tasks, and systems of services, defining several useful concepts, giving many examples)
- Modelling Approaches (describing different types of models we might later want to come up with for service systems; introducing a "simple abstract model" for this framework)
- Research Questions ( a) how to think about, and generate lists of, problems and other research tasks for this paradigm; b) a list of research questions, not exhaustive but reasonably wide)
- Related Fields of Study (a list of existing research fields that seem particularly relevant)
- Research Suggestions (references for further reading, our guesses for what might be valuable to investigate)
Some of our conclusions
We think the paradigm of systems of services deserves further attention. In terms of technical alignment research, it might expose some new problems and provide a new point of view on the existing ones. Moreover, AI currently takes the form of services, rather than agents. As a result, the service-system paradigm might be more suitable for communicating with wider audience (e.g., the public, less technical fields like AI policy, and AI researchers from outside of the long-termist community).
We have no definitive answers for what work needs to be done in this area, but some of the useful directions seem to be:
- Technical problems and formalization. Formalizing and getting progress on technical problems in service systems. As a side product, we might also attempt to build more solid foundations for the paradigm (i.e., formalizing the basic concepts, building mathematical models).
- Building on top of Reframing Superintelligence: Drexler's text identifies and informally states many important hypotheses about the nature of AI-service systems, such as the claim that there will be no compelling incentives to replace comprehensive narrow services by an "agent-like" AGI (Section 12). We believe it would be valuable to (a) map out the different hypotheses and assumptions made in the report and (b) formalize specific hypotheses and explore them further.
- Tools or agents? Many people seem to believe that while "agent-like" AGI and “fully-automated comprehensive system of services” might have similar capabilities, there is some fundamental difference between the two types of AI. At the same time, there seems to be a general confusion around this topic. Some relevant questions are:
- Can we find a framework in which these similarities and differences could be explained or dissolved?
- In particular, does there perhaps exist a formalization of “agency” that can differentiate between the two?
- If there are meaningful distinctions, how do they translate into what it means to “align” each type of AI?
- How does the effectivity of narrow-purpose algorithms differ from the effectivity of general-purpose algorithms? Should we expect economic pressures towards generality (and maybe even "agent-like" AGI)?
- Identifying connections to existing fields. In many cases, problems with AI services will "merely" be automated versions of problems previously studied in other fields (e.g., automating the police might seemingly push some aspects of law under the umbrella of AI). For people who already have an expertise both in AI risk and in some relevant field, it might thus be valuable to clarify the connection of the two. On the one hand, clearly laying out how existing fields relate to service-systems might prevent other alignment researchers from reinventing the wheel by ignoring the existing research. On the other hand, it is important to identify the problems that arise when the existing field gets applied to service-systems and AI risk. We can then bring these new problems to the attention of the existing research communities, thus offloading some of the work (and possibly steering the field towards topics with more long-term impact). A prime example of a field whose utilization might be highly impactful is AI ethics. However, due to effects like idea inoculation, reputation costs (to AI risk), and unilateralist’s curse, we think this task should only be attempted by people who are experienced at communicating ideas on this level and well-positioned to do so.
Acknowledgement: This document has been written by myself (Vojta), based on ideas I collected from a collaboration with Cara Selvarajah, Chris van Merwijk, Francisco Carvalho, Jan Kulveit, and Tushant Jha during the 2019/2020 AI Safety Research Program. While they gave a lot of feedback to the text and many of the ideas are originally theirs, they might not necessarily agree with all arguments and framing presented here. All mistakes are mine.
Thoughts while reading:
Sections 5 and 6, not 6 and 7
I feel like this definition is not capturing what I mean by a "task". Many "agent-like" things, such as "become supreme ruler of the world", seem like tasks according to this definition; many useless things like "twitching randomly" can be thought of as completing a "task" as defined here and so would be counted as "services".
(I don't have a much better definition, but I prefer to just use the intuitive notion of "task" rather than the definition you have.)
I'm pretty sure that's not what the frame problem is. The frame problem is that by default you ignore consequences of your actions, and so you have to arduously specify all of the things that shouldn't change.
I'm surprised this matters in this model, continuing to read... ah okay, this model isn't being used anywhere.
Looking at these, I feel like they are subquestions of "how do you design a good society that can handle technological development" -- most of it is not AI-specific or CAIS-specific.
For me this is the main point of CAIS. It reframes many AI Safety problems in terms of "make a good society" problems, but now you can consider scenarios involving only AI. We can start to answer the question of "how do we make a good society of AIs?" with the question "How did we do it with humans?". It seems like human society did not have great outcomes for everyone by default. Making human society function took a lot of work, and failed a lot of times. Can we learn from that and make AI Society fail less often or less catastrophically?
Yeah to be clear I do think it is worth it for people to be thinking about these problems from this perspective; I just don't think they need to be AI researchers.
Yeah, I understand that. My point is that the same way society didn't work by default, systems of AI won't work by default, and that the interventions that will be needed will require AI researchers. That is, it's not just about setting up laws, norms, contracts, and standards for managing these systems. It is about figuring out how to make AI systems which interact with each other in the way that humans do in the presence of laws, norms, standards and contracts. Someone who is not an AI research would have no hope in solving this, since they cannot understand how AI systems will interact, and cannot offer appropriate interventions.
It seems to me like you might each be imagining a slightly different situation.
Not quite certain what the difference is. But it seems like Michael is talking about setting up well the parts of the system that are mostly/only AI. In my opinion, this requires AI researchers, in collaboration with experts from whatever-area-is-getting-automated. (So while it might not fall only under the umbrella of AI research, it critically requires it.) Whereas - it seems to me that - Rohin is talking more about ensuring that the (mostly) human parts of society do their job in the presence of automatization. For example, how to deal with unemployment when parts of the industry get automated. (And I agree that I wouldn't go looking for AI researches when tackling this.)
Fixed the wrong section numbers and frame problem description.
Could it be that the problem is not in the "task" part but in the definition service? If I consider the task of building me a house that I will like, I can envision a very service-like way of doing that (ask me a bunch of routine questions, select house-model correspondingly, then proceed to build it in a cook-book manner by calling on other services). But I can also imagine going about this in a very agent-like manner.
(Also, "twitching randomly" seems like a perfectly valid task, and a twitch-bot as a perfectly valid service. Just a very stupid one that nobody would want to build or pay for. Uhm, probably. Hopefully.)
It seems like what you're trying to get at is some notion of a difference between a service and an agent. My objection is primarily that the specific definitions you chose don't seem to point at the essential differences between a service and an agent. I don't have a strong opinion as to whether the problem is with the definition of "task" or of "service"; just that together they don't seem to point at the right thing.
It is intentional that not all the problems are technical problems - for example, I expect that not tackling unemployment due to AI might indirectly make you a lot less safe (it seems prudent to not be in a civial war or war when you are attempting to finish building AGI). However, you are right that the list might nevertheless be too broad (and too loosely tied to AI).
Anyway: As a smaller point, I feel that most of the listed problems will get magnified as you introduce more AI services, or they might gain important twists. As a larger point: Am I correct to understand you as implying that "technical AI alignment researchers should primarily focus on other problems" (modulo qualifications)? My intuition is that this doesn't follow, or at least that we might disagree on the degree to which this needs to be qualified to be true. However, I have not yet thought about this enough to be able to elaborate more right now :(. A bookmark that seems relevant is the following prompt:
Conditional on your AI system never turning into an agent-like AGI, how is "not dying and not losing some % of your potential utility because of AI" different from "how do you design a good society that can handle the process of more and more things getting automated"?
(This should go with many disclaimers, first among those the fact that this is a prompt, not an implicit statement that I fully endorse.)
Kind of? I think it's more like "these are indeed problems, and someone should focus on them, but I wouldn't call it technical AI alignment" (and as a result, I wouldn't call people working on them "technical AI alignment researchers"). For many of these problems, if I wanted to find people to work on them, I would not look for AI researchers (and instead look for economists, political theorists, etc).
Like, I kind of wish this document had been written without AI / AI safety researchers in mind.
Planned summary for the Alignment Newsletter:
I agree with your points in the suggested summary. However, I feel like they are not fully representative of the text. But, as the author, I might be imagining the version of the document in my head rather than the one I actually wrote :-).
Side-note 1: I also think that most of the classical AI safety problems also appear in systems of AI services (either in individual services, or in "system-wide variants"). But this is only mentioned in the text briefly, since I am not yet fully clear on how to do the translation between agent-like AIs and systems of AI services. (Also, on the extent to which such translation even makes sense.)
Side-note 2: I imagine that many "non-AI problems" might become "somewhat-AI problems" or even "problems that AI researchers need to deal with" once we get enough progress in AI to automate the corresponding domains.
Sorry for not checking here before the newsletter went out :/
Hmm, I didn't mean to imply this.
That was somewhat intended -- words are at a premium in the newsletter, so I have to make decisions about what to include. However, given that you find the classification subsection is equally important, I'll at least add that in.
That's a fair point, I hadn't realized that.
I've made the following changes to the LW version of the newsletter: