Advanced agents"Sufficiently advanced Artificial Intelligences" are the subjects of AI alignment theory; machine intelligences potent enough that:
As an example: Current machine learning algorithms are nowhere near the point that they'd try to resist if somebody pressed the off-switch. That would require,happen given, e.g.:
The above reasoning also suggests e.g. that general intelligence is an advanced agent property, because a general ability to learn new domains could eventually lead the AI to understand that it has an off switch.
One reason to keep the term 'advanced' on an informal basis is that in an intuitive sense we want it to mean "AI we need to take seriously" in a way independent of particular architectures or particular accomplishments. To the philosophy undergrad who 'proves' that AI can never be "truly intelligent" because it is "merely deterministic and mechanical", one possible reply is, "Look, if it's building a Dyson Sphere, I don't care if you define it as 'intelligent' or not." Any particular advanced agent property should be understood in a background context of "Look, if"If a computer program is doing X, it doesn't matter if we define that as 'intelligent' or 'general' or even as 'agenty', what matters is that it's doing X." Likewise the notion of 'sufficiently advanced''sufficiently advanced AI' in general.
The goal of defining advanced agent properties is not to have neat definitions, but to correctly predict and carve at the natural joints for which cognitive thresholds in AI development could lead to which real-world powers,abilities, corresponding to which alignment issues.
An alignment issue may need to have been already been solved at the time an AI first acquires an advanced agent property; the notion is not that we are defining observational thresholds for society first needing to think about a problem.
Human-relativeRelative-threshold advanced agent properties (those whose key thresholdlines are determinedrelated to some extent byvarious human levels of capability):
you need to have explicit goals, reason in a general way from actions to goals, have enough big-picture strategic awareness to realize that you are a computer and have an off switch, and connect the dots
Some possible advanced agent properties:
Artificial General Intelligence, or the ability to learn new domains, potentially leading into other properties.
The AI can learn new domains that humans don't understand.
An example of a more formal property is "epistemic efficiency": the AI's estimates are always at least as good as our own estimates.]
(For the general concept of an overview of other agent properties besides the advanced agent properties,agent, see standard agent properties.)
Since there's apparently multiple avenues we can imagine for how an AI could start to be this powerful, "advanced agent" doesn't have a neat necessary-and-sufficient definition. Similarly, some of the advanced agent properties are easier to formalize or pseudoformalize than others.
One example of a relatively definable property is relative efficiency within a domain. For example, an agent appears 'epistemically efficient' to us if we can't predict any directional error in its estimates. E.g., we can't expect a superintelligence to precisely estimate the exact number of hydrogen atoms in the Sun, but it would be very odd if we could predict in advance that the superintelligence would overestimate this number by 10%. It seems very reasonable to expect that sufficiently advanced superintelligences would have this particular property, relative to humans, over all domains (even human stock markets have this property in the short run for the relative prices of highly liquid assets). An agent that was efficient at, say, social manipulation of humans, would definitely be advanced enough to be pivotal and potentially dangerous, even if it wasn't efficient across all domains.
Another example of a relatively definable property is cognitive uncontainability within a domain - the agent searches a broad-enough space of options that we can't predict what its best option will look like or how much of the agent's expected utility will be available to it. This kind of uncontainability is impossible in narrow, perfectly known spaces like logical Tic-Tac-Toe,Toe, but can start to manifest as early as the domain of Go, possiblyGo - AlphaGo...
Other material domains besides nanotechnology might be pivotal. E.g., self-replicating ordinary manufacturing could potentially be pivotal given enough lead time; molecular nanotechnology is distinguished by its small timescale of mechanical operations and by the world containing an infinitelyinfinite stock of perfectly machined spare parts (aka atoms). Any form of cognitive adeptness that can lead up to rapid infrastructure or other ways of quickly gaining a decisive real-world technological advantage would qualify.
One example ofAs an example: Current machine learning algorithms are nowhere near the point that they'd try to resist if somebody pressed the off-switch. That would require, e.g.:
So the threshold at which you might need to start thinking about 'advanced safetyshutdownability' or 'abortability' or corrigibility considerations might startas it relates to become relevant if the AGIhaving an off-switch, is searching 'weird' parts of solution space and hence is cognitively uncontainable on the 'real-world big-picture strategic awareness' plus 'cross-domain consequentialism'. This would already start to bring in considerations like edge instantiations, unforeseen maximums, nearest unblocked strategies, and context-change disastersThese two cognitive thresholds can thus be termed 'advanced agent properties'.
An example ofThe above reasoning also suggests e.g. that general intelligence is an advanced agent property, because a less crisp advanced-agent property might be 'generality', thegeneral ability to learn and interrelate a broad variety of new domains rather than needingcould eventually lead the AI to be explicitly preprogrammed with them. This in turn would lead into safety-relevant properties like "Can learn domains unknown to the programmers", or "Can learn about human psychology", or "Can learn about and understand the strategic bigger picture."that it has an off switch.
One reason to keep the term 'advanced' on an informal basis is that in an intuitive sense we want it to mean "AI we need to take seriously" in a way independent of particular architectures or particular accomplishments. To the...
There'Some examples of properties that might make an agent this powerful:
Since there's apparently multiple avenues we can imagine for how an AI could start to be that powerful in an informal sense, even without being a superintelligence, and some of these informal types of power don't yet have neat formal descriptions. Thus,this powerful, "advanced agent" doesn't have a neat necessary-and-sufficient definition. Similarly, some of the advanced agent properties are easier to formalize than others.
you need to have explicit goals, reason in a general way from actions to goals, have enough big-picture strategic awareness to realize that you are a computer and have an off switch, and connect the dots
Some possible advanced agent properties:
Artificial General Intelligence, or the ability to learn new domains, potentially leading into other properties.
The AI can learn new domains that humans don't understand.
An example of a more formal property is "epistemic efficiency": the AI's estimates are always at least as good as our own estimates.]
Advanced agents are the subjects of AI alignment theory; machine intelligences potent enough that (a) thethat:
Some examples ofexample properties that might make an agent this powerful:sufficiently powerful for 1 and/or 2:
Since there's multiple avenues we can imagine for how an AI could start to be this powerful, "advanced agent"sufficiently powerful along various dimensions, 'advanced agent' doesn't have a neat necessary-and-sufficient definition. Similarly, some of the advanced agent properties are easier to formalize or pseudoformalize than others.
One example of a relatively definable property is cognitive uncontainability within a domain - the agent searches a broad-enough space of options that we can't predict what its best option will look like or how much of the agent's expected utility will be available to it. This kind of uncontainability is impossible in narrow, perfectly known spaces like logical Tic-Tac-Toe, but can start to manifest as early as the domain of Go - AlphaGo played moves that human champions initially found puzzling and unexpected, because the logical Go rules encompass sufficient game complexity that yougood moves can start to haveappear "weird" moves. Real-world domains, wherefrom a falling leaf (physics and botany) can be nudged by a flying bee (biology) and both are far more complicated than a Go board and without completely-known-to-humans axiomatized rules, would be even richer than Go.
Cognitive uncontainability can potentially happen when an AI searches a different style of solution, not just when an AI searches a strictly larger set of solutions.human perspective. Even if an AGI is,is still in some sense, stillgeneral infrahuman, advanced-advanced safety considerations might start to bebecome relevant if the AGI is searching 'weird' parts of solution space and hence is cognitively uncontainable on the real-world domain.domain. This would already start to bring in considerations like edge instantiations, unforeseen maximums, nearest unblocked strategies, and context-change disasters.
An example of a less crisp advanced-agent property might be "'generality" and its correlates: "Can', the ability to learn and interrelate manya broad variety of new domains rather than needing to be programmed for them",explicitly preprogrammed with them. This in turn would lead into safety-relevant properties like "Can learn subjectsdomains unknown to the programmers", or "Can start to learn about human psychology", or "Can learn about and understand the strategic bigger picture."
One reason to keep the term 'advanced' on an informal basis, or even as something of a placeholder,basis is that in an intuitive sense we want it to mean "AI we need to take seriously" in a way independent of particular architectures or particular accomplishments. To the philosophy undergrad who 'proves' that AI can...