Previously I wrote about what it would mean for AI to “go well”. I would like to elaborate on this and propose some details towards a “scale-free” definition of alignment. Here “scale-free alignment” means a version of alignment that does not feature sudden and rapid “phase shifts”, so as aligned actors get more intelligent their behaviour remains understandable and approved by less intelligent actors. In other words, there should be no moment where a superintelligence looks at us and says “I understand that to you it looks like I’m about to annihilate Earth and everyone you love, but trust me this is going to work out great. After all, which one of us as 10,000 IQ?” This is an extension of the idea that to understand something well, you should be able to explain it simply, even to a five year-old. Similarly, a good actor should endeavour to be “good-registering” to everyone who is not actively malicious, including five year-olds. Certainly many things will get lost in the translation, but I believe that there is some core element of “good-alignedness” that can be sketched out and made consistent across scales.
It is notoriously difficult to define “gthood”. However, humans do have rather robust intuitions around “care” which derive from cultural ideas like motherhood, family, the relationship between a master and an apprentice, conservation of both nature and human artefacts, etc. So instead of writing down a one-line definition that will be argued to death, I will instead use a scale and sketch out different ideas of “care” for different kinds of entities with different levels of complexity. These, when taken together, will point us towards the definition of scale-free alignment. And then, at the end, I will try to do a shorter definition that encapsulates all of what I have said above.
A key idea behind scale-free alignment is that what works at lower scales also works at higher scales. In other words, a more complex or intelligent creature may have additional needs compared to a less complex or intelligent entity, but it will still have the same needs as its less intelligent counterpart. This idea of simple core needs diversifying as entities become more complex is part of the intuition behind things like Maslow’s Hierarchy of Needs, the Golden Rule, and the Hippocratic Oath. To start our scale we will start with the simplest possible actors—things that aren’t actors at all.
Inanimate Objects
Imagine that you have been asked to take care of a priceless work of art, a family hierloom, or simply your favourite pet rock. Here the principles of art conservation and museum conservation are clear: don’t break it. If possible, objects are to be isolated from damaging stimulus, and their original environment is to be preserved where reasonable. Thus ice sculptures need to be kept cold, while liquids need to be kept above their freezing but below their boiling point. Normally this also means preventing objects from receiving large amounts of blunt force, being stolen, or otherwise being destroyed.
Simple Organisms
Now imagine that you are a grad student being asked to take care of a petri dish of bacteria. The previous requirements all apply: you should probably not move it out of its accustomed temperature, and definitely don’t crush it with a sledgehammer or burn it with fire. However, the bacteria have new needs: they need to be fed with nutrients, exposed to warmth or light, and possibly kept hydrated. They may need simple regular maintenance in their environment to prevent contamination and death.
Complex Multicellular Organisms
Now imagine that you have been asked to take care of a loved one’s pet temporarily. First, we reuse the playbook for the simple organism and the inanimate object. Don’t hit it, keep it warm but not too warm, feed it with food and water, shelter it. But now we add on top things like emotional needs: company, socialisation and exposure to novelty. Here we see the first significant trade off between two needs: some amount of security and some amount of liberty. It would obviously be bad to let loose your puppy in a warzone, but on the other hand confinement in a steel vault 24/7 may not be the best solution either. Of course, different multicellular organisms will have different levels of such needs, the recipe for keeping a cat happy is not the recipe for keeping a bear happy. But overall we add another layer to our definition of care.
Intelligent Organisms
One layer up again. This layer is analogous to parenting, and I will not belabour the point too much. On top of all of our previously established needs we add needs for complex social organisation, a sense of purpose, and a way to handle complex concepts like suffering and death. So far, most of what I have described is fairly obvious. But the happy outcome of scale-free alignment is that we can actually go beyond the realms of what we know instinctually and push the metaphor further. What happens when life becomes more complex than an individual human?
Social or Collective Organisms
Here we are tasked with taking care of a country or a collective group. It’s notable how well our previously established definitions transfer: it would obviously be bad for the country to be physically torn apart or subject to violence, and it would also be bad if the country wee subject to famine or natural disasters. These are analogous to the “simple needs” of inanimate objects and simple organisms. On top of that, countries need ways of defining a sense of citizenship, a method of handling social trauma, and a need to coexist peacefully both externally (in the diplomatic sense) and internally (resolving social conflict). The additional needs of this level come from the need to organise at scales beyond individual communication, trade off between individual liberty and collective security, and pursue large scale coordination projects for the common good—these are amply discussed in the works of James Scott, Ursula Le Guin and Karel Čapek.
Civilisational Organisms
Thus far, no actual attempt to organise and take care of the human civilisation collectively has succeeded. However, we can again apply our rule and extrapolate from the national scale: civilisational risk is a natural escalation from national risk. At this point what is needed exceeds the capacity of individual human computation or coordination and requires a higher level of information processing capability. Therefore, we start to think about Kardashev scales and similar metrics—but here we enter the realm of speculation beyond the limits of the essay.
Conclusion
What does this exercise tell us? To begin, it is actually quite easy to construct “smooth” ideas of care or wellbeing that push us from one scale of complexity to the next. The issues which divide society come from edge cases, conflicts between different needs, and the messy realities of implementation: almost everyone agrees that people should be fed, housed, and free from war and suffering in the abstract.
Furthermore, these needs actually reflect basic principles that are common across all things, from rocks to people. First, actors and objects wish to be free from harm. This can be physical, social, emotional, psychological etc. Second, actors wish to develop and experience growth. This is implicit in the need for living beings to receive energy, socialisation, novelty, and positive experiences. We want to reach new and pleasing states of being, to meet new and interesting people, to uncover truths about the world, and to do it all with our friends and loved ones. The epitome of this growth is symbiogenesis, or the formation of more complex life from simple life: from cells to organisms to families to nations to civilisations. From this we obtain my attempt at defining scale-free goodness: the smooth increase in the amount of negentropy in the universe. Negentropy is the opposite of entropy, the rejection of death and decay in favour of life, ever-increasing diversity, and fruitful complexity. As Václav Havel writes in his famous letter “Dear Dr. Husák”:
Just as the constant increase of entropy is the basic law of the universe, so it is the basic law of life to be ever more highly structured and to struggle against entropy.
Life rebels against all uniformity and leveling; its aim is not sameness, but variety, the restlessness of transcendence, the adventure of novelty and rebellion against the status quo. An essential condition for its enhancement is the secret constantly made manifest.
Introduction
Previously I wrote about what it would mean for AI to “go well”. I would like to elaborate on this and propose some details towards a “scale-free” definition of alignment. Here “scale-free alignment” means a version of alignment that does not feature sudden and rapid “phase shifts”, so as aligned actors get more intelligent their behaviour remains understandable and approved by less intelligent actors. In other words, there should be no moment where a superintelligence looks at us and says “I understand that to you it looks like I’m about to annihilate Earth and everyone you love, but trust me this is going to work out great. After all, which one of us as 10,000 IQ?” This is an extension of the idea that to understand something well, you should be able to explain it simply, even to a five year-old. Similarly, a good actor should endeavour to be “good-registering” to everyone who is not actively malicious, including five year-olds. Certainly many things will get lost in the translation, but I believe that there is some core element of “good-alignedness” that can be sketched out and made consistent across scales.
This work has been carried out as part of the Human Inductive Bias Project.
Defining “the Good”
It is notoriously difficult to define “gthood”. However, humans do have rather robust intuitions around “care” which derive from cultural ideas like motherhood, family, the relationship between a master and an apprentice, conservation of both nature and human artefacts, etc. So instead of writing down a one-line definition that will be argued to death, I will instead use a scale and sketch out different ideas of “care” for different kinds of entities with different levels of complexity. These, when taken together, will point us towards the definition of scale-free alignment. And then, at the end, I will try to do a shorter definition that encapsulates all of what I have said above.
A key idea behind scale-free alignment is that what works at lower scales also works at higher scales. In other words, a more complex or intelligent creature may have additional needs compared to a less complex or intelligent entity, but it will still have the same needs as its less intelligent counterpart. This idea of simple core needs diversifying as entities become more complex is part of the intuition behind things like Maslow’s Hierarchy of Needs, the Golden Rule, and the Hippocratic Oath. To start our scale we will start with the simplest possible actors—things that aren’t actors at all.
Inanimate Objects
Imagine that you have been asked to take care of a priceless work of art, a family hierloom, or simply your favourite pet rock. Here the principles of art conservation and museum conservation are clear: don’t break it. If possible, objects are to be isolated from damaging stimulus, and their original environment is to be preserved where reasonable. Thus ice sculptures need to be kept cold, while liquids need to be kept above their freezing but below their boiling point. Normally this also means preventing objects from receiving large amounts of blunt force, being stolen, or otherwise being destroyed.
Simple Organisms
Now imagine that you are a grad student being asked to take care of a petri dish of bacteria. The previous requirements all apply: you should probably not move it out of its accustomed temperature, and definitely don’t crush it with a sledgehammer or burn it with fire. However, the bacteria have new needs: they need to be fed with nutrients, exposed to warmth or light, and possibly kept hydrated. They may need simple regular maintenance in their environment to prevent contamination and death.
Complex Multicellular Organisms
Now imagine that you have been asked to take care of a loved one’s pet temporarily. First, we reuse the playbook for the simple organism and the inanimate object. Don’t hit it, keep it warm but not too warm, feed it with food and water, shelter it. But now we add on top things like emotional needs: company, socialisation and exposure to novelty. Here we see the first significant trade off between two needs: some amount of security and some amount of liberty. It would obviously be bad to let loose your puppy in a warzone, but on the other hand confinement in a steel vault 24/7 may not be the best solution either. Of course, different multicellular organisms will have different levels of such needs, the recipe for keeping a cat happy is not the recipe for keeping a bear happy. But overall we add another layer to our definition of care.
Intelligent Organisms
One layer up again. This layer is analogous to parenting, and I will not belabour the point too much. On top of all of our previously established needs we add needs for complex social organisation, a sense of purpose, and a way to handle complex concepts like suffering and death. So far, most of what I have described is fairly obvious. But the happy outcome of scale-free alignment is that we can actually go beyond the realms of what we know instinctually and push the metaphor further. What happens when life becomes more complex than an individual human?
Social or Collective Organisms
Here we are tasked with taking care of a country or a collective group. It’s notable how well our previously established definitions transfer: it would obviously be bad for the country to be physically torn apart or subject to violence, and it would also be bad if the country wee subject to famine or natural disasters. These are analogous to the “simple needs” of inanimate objects and simple organisms. On top of that, countries need ways of defining a sense of citizenship, a method of handling social trauma, and a need to coexist peacefully both externally (in the diplomatic sense) and internally (resolving social conflict). The additional needs of this level come from the need to organise at scales beyond individual communication, trade off between individual liberty and collective security, and pursue large scale coordination projects for the common good—these are amply discussed in the works of James Scott, Ursula Le Guin and Karel Čapek.
Civilisational Organisms
Thus far, no actual attempt to organise and take care of the human civilisation collectively has succeeded. However, we can again apply our rule and extrapolate from the national scale: civilisational risk is a natural escalation from national risk. At this point what is needed exceeds the capacity of individual human computation or coordination and requires a higher level of information processing capability. Therefore, we start to think about Kardashev scales and similar metrics—but here we enter the realm of speculation beyond the limits of the essay.
Conclusion
What does this exercise tell us? To begin, it is actually quite easy to construct “smooth” ideas of care or wellbeing that push us from one scale of complexity to the next. The issues which divide society come from edge cases, conflicts between different needs, and the messy realities of implementation: almost everyone agrees that people should be fed, housed, and free from war and suffering in the abstract.
Furthermore, these needs actually reflect basic principles that are common across all things, from rocks to people. First, actors and objects wish to be free from harm. This can be physical, social, emotional, psychological etc. Second, actors wish to develop and experience growth. This is implicit in the need for living beings to receive energy, socialisation, novelty, and positive experiences. We want to reach new and pleasing states of being, to meet new and interesting people, to uncover truths about the world, and to do it all with our friends and loved ones. The epitome of this growth is symbiogenesis, or the formation of more complex life from simple life: from cells to organisms to families to nations to civilisations. From this we obtain my attempt at defining scale-free goodness: the smooth increase in the amount of negentropy in the universe. Negentropy is the opposite of entropy, the rejection of death and decay in favour of life, ever-increasing diversity, and fruitful complexity. As Václav Havel writes in his famous letter “Dear Dr. Husák”: