Prior to the era of superintelligent actors, we’re likely to see a brief era of superagentic actors—actors who are capable of setting and achieving goals in the pursuit of a given end with significantly greater efficiency and reliability than any single human. Superagents may in certain restricted senses act superintelligently—see principles 8, 9—but this isn’t strictly necessary. A superagent may be constructed from a well-scaffolded cluster of artificial intelligences, but in the near-term it’s far more likely that superagents will consist of one or more humans, aided by well-scaffolded AIs, since humans still have a few properties vital to agency that AIs haven’t fully acquired (yet).
As with ‘superintelligence’, there’s no canonical demarcation between superagentic actors and non-superagentic actors; there are only so many different properties which are likely to end up being strongly correlated at large scale, but which may end up uncoupled in particular cases (especially transitional ones), producing a jagged frontier of agency.
Here’s a list of possible properties by virtue of which an actor may achieve superagency.
Principle 1 (Directedness)
A superagent may have vastly improved self-monitoring, introspection, and control.
In most intellectual tasks, humans spend the overwhelming majority of their time in predictably unproductive patterns: they are caught up in minutiae, overpolishing what ought to be discarded, failing to filter distractors. They generally fail to notice, or are unwilling to acknowledge, when they’re taking the wrong direction entirely even when they could easily recognize this, and are resistant to change once they’ve invested a lot of their time or ego in a particular approach. Even though they can often easily diagnose these mistakes when other people are making them, they can’t easily avoid these mistakes themselves.
A superagent, on the other hand, may be able to plot a reasonable route to their goal and directly take it without distractions, quickly noticing and correcting unproductive patterns and directions.
Principle 2 (Alignment)
A superagent may consistently keep track of whether its behaviors are aligned towards and optimal for a given end.
Humans rarely step back from their efforts to orient themselves, and don’t systematically ask themselves key questions concerning alignment:
Towards what end am I doing this?
Are my efforts here aligned with and productive towards that end?
Is this really the best thing I could be doing to achieve that end?
The ability to answer such questions consistently (and recursively, up the chain of “why”s) is very rare, often for ego-based or otherwise emotional reasons—one is unwilling to find out that their pet project is actually unimportant, or that they should scrap their present efforts—and comes unnaturally to humans, who hardly do this in their lives.
Example: Protests are almost never the most useful way to spend x units of time, effort, or resources on a given cause, and this is obvious—but people do it anyway, because they’re conditioned to think that’s the sort of thing that you should do when you strongly support a cause. We follow these culturally autocompleted behavioral patterns, in every part of their lives, and cannot reliably step back to explicitly think about optimizing our actions for their given ends. But all it would take to fix this is correct (programmatic) prompting and consistent habits.
Principle 3 (Uninhibitedness)
A superagent may not be restricted by felt senses of courtesy, fairness, guilt, or honor.
Almost all humans have ingrained social norms / cultural patterns that encourage them to treat other humans as ends in themselves, and to avoid harming others, especially in ways that would look dishonorable or unfair. Most of us hesitate to take, and when taking, actions that violate social norms or feel “illegitimate”. Sending cold emails, pirating software and media, steroid use, social media sockpuppeting, lying on forms, and many other easy and beneficial actions are inhibited this way. We justify these inhibitions by cognitions like “what if I get caught?”, or vague appeals to notions of reputation and trust, ignoring our ability to rationally assess the actual risks of getting caught, losing reputation, or losing trust (which are often low, or easily mitigated).
To a superagent, humans and their institutions may be treated as any other system, with assessable inputs and outputs, and managed costs and risks; concepts like “fairness” and “courtesy” are only relevant insofar as they’re constructs within the system that can affect expected outcomes. Appearing discourteous can have repercussions. But to be discourteous is a meaningless concept outside of the social context that humans alone cannot escape.
This doesn’t necessarily make the superagent dangerous; its goals or restraints may respect things like prosociality, and it may pay this respect more efficiently and effectively than ordinary human agents. It just may not be constrained by an internalized superego in the way that humans are.
Example: Many humans have to fight against themselves to send cold emails to potential collaborators. They might feel it’s forward or presumptuous, agonize over the phrasing of a rough draft for hours, and then decide not to send it at all. Five minutes of cognitive effort, directed well, would produce a better result.
A superagent may direct their cognitive efforts that well—for everything—and simply not hesitate to do things for social reasons as humans do. Are there ten other important people that could just as easily be sent slight variants on this cold email? (Yes). Are they all sitting together and comparing their cold emails in order to ignore anyone who sends too many? (Almost always no; where plausible, the risk has already been assessed and priced in). Then ten slight variants may be drafted and sent with five additional minutes of effort, not ten additional hours of nervous self-inhibition.
(If this principle seems to entail psychopathy, that’s because it does. If you’re thinking that agency ought to be an unqualified good, or that ‘superagentic’ ought to be a compliment, that’s your problem).
Principle 4 (Foresight)
A superagent may not make foreseeable mistakes.
When humans make mistakes, it’s often because we didn’t spend enough cognitive resources thinking through the potential failure modes of our actions. Maybe we have a narrow focus and miss the bigger picture, or fail to weigh the perspectives of people with different interests. We fail to perform obvious sanity checks, or have mental blocks that prevent us from seeing a problem that’s directly in front of us.
So we make foreseeable mistakes -- or, mistakes which, at the time of making them, could have been avoided by thinking a few steps ahead, or running a quick check. Things like
Not backing up data, and then losing it (when’s the last time you backed up your computer?)
Failing to consider how your behavior looks from the perspective of other important actors
Failing to notice that a given subtask is not going as expected, and intervening early
In general, whenever a mistake causes us to say “I should’ve seen that coming, that was preventable, ...”, a superagent may be designed that does see it coming, and does prevent it. They may still make novel mistakes, mistakes because the world has hidden structure or causality that they didn’t know about; they may also make mistakes when embedded in large, complex systems whose behavior is just not tractably predictable. But superhuman ability at foreseeing and avoiding foreseeable and avoidable obstacles seems readily achievable. This may be because the agent has a constant background process of looking for potential errors, or runs simulations of adversarial critics to find flaws in its own plan before it executes it (wargaming against intelligent and stochastic red teams), or builds internal infrastructure and carefully-followed checklists to make errors impossible, or so on.
Principle 5 (Parallelization)
A superagent may have vastly improved ability to manage and coordinate across many threads of attention.
Humans are limited to a single thread of serial cognition. We can juggle a few things in the background, but at best we can only have one continuous line of reasoning at a time, and we parallelize poorly due to our inability to effectively context switch. As the number of tasks we have to juggle grows, the cognitive load quickly exceeds our capacity.
A superagent may be able to maintain many threads of cognition, each processing a different subproblem, potentially delegating tasks to subagents, as in a distributed system. It may spawn off a subagent to do some data entry, another to draft a response to a letter, another to plan a schedule, another to work on a technical problem, and then monitor all of these virtual threads. Alternatively, a superagent may just have the ability to process multiple thought streams in parallel, though the multithreading approach is more compatible with the present architectures from which superagents might be built. There are situations where humans can natively parallelize, to some small extent—we can walk and talk at the same time, usually—but we can’t simultaneously read, listen, talk, and type different things. We have one face and two hands, but there’s no reason an agent can’t have thousands of faces and thousands of hands with which it speaks, listens, and acts simultaneously.
Principle 6 (Planning)
A superagent may have the capacity for extremely large-scale, long-term planning.
To humans, distance through time and space have an appreciable weight to them; they make things feel hazy and unclear. We find it hard to conceptualize time spans more than a few years, and more generally conceptualize the future as a sort of abstract situation: we hold its possible states in our mind as “long-term outcomes” to be hoped for or avoided, not as concrete things we can influence through present actions. We don’t naturally have the ability to create and execute plans over long time horizons.
Long-term planning is especially difficult if the steps depend on information that we don’t yet have (e.g., the outcome of a process that takes place a few months down the line), even if we can clearly articulate in advance which actions we would take upon each outcome, since we get overwhelmed by the complexity of reasoning over possible decision trees instead of individual decisions—even when we can use tools like whiteboards to keep from having to hold the entire tree in our head at once.
A superagent may treat the future with the same clarity of attention as the present. They may naturally think about the future as something concrete, something they can directly (if probabilistically) manipulate, and easily create multistep plans that extend years into the future. If they have a goal that is distant in time, they can just start working on it now, lacking any feeling of its being “very far away” to distract them. They may readily perform preliminary tasks today that they expect to free up additional options in a certain decision several months from now, since the only real difference between today and any given future date is that one can be seen more clearly.
Example: We notably use proximity as a crutch for our attention. If I need to make some slides for a conference two months away, I’ll always—as though it were a law of physics, always—wait until the week before to start preparing, even though I could prepare it at any time. We justify this indecision with cognitions like “but things might change; my influence over the future is weaker, so I shouldn’t act on it just yet” which are more often posited as excuses to give up than as issues that can be analyzed and mitigated. (Do chess grandmasters whine about how there’s no point in long-term calculations because anything could happen in the next twenty moves? No. They calculate what they can, prepare alternatives for different contingencies, and take actions to open up or secure certain future movement options).
Principle 7 (Flow)
A superagent may have vastly reduced transaction costs to thinking and acting.
Humans tend to have an inertia to cognition that makes us hesitant to think through any particular question that we can subsume under a cached conclusion for a general case, and diverts us from mentally ‘shopping around’ for alternative solutions, once our cognition has secured for us a first solution that we’re with. We underperform at tasks that we don’t like, or consider ourselves good at, since we conceptually identify ourselves as confused, some task as beyond our abilities, etc., rendering ourselves inefficient or slow at tasks we could just do right if our mindset were corrected.
Nor can we work indefinitely—we get maybe a few hours per day of peak intellectual performance—or context switch effectively; if interrupted, we take a while to become performant at our original task again. We need lots of breaks, and generally have to delegate a finite and rather small amount of willpower per day as though it were a currency.
A superagent may not have questions of willpower, and may be in a permanent flow state; they may decide upon what needs to be accomplished, and perform the required steps to accomplish it with the uninterrupted automaticity of clockwork, while remaining open to alternative solutions. A sole human that perfectly possessed this property would still need x hours of sleep a night, would still act more sluggish when they’re jet-lagged or have low blood sugar, etc., but would never decide to eat a chocolate bar or play a video game, unless circumstances rationally incentivized such actions. These wouldn’t be actions that they have to mentally resist; they’re just understood to be pointless. More generally, mental moves that ordinarily challenge humans, like changing to a new, alternative strategy even after spending lots of effort on a current strategy, could similarly be understood as optimal and performed without mental resistance.
Example: When we have a complicated task in front of us, we often do a quick mental scan for a starting point, and, if no obvious one comes to mind, we get overwhelmed. Then we start going in circles, trying random things, or give up. If we run into an obstacle that our intuition doesn’t immediately tell us how to solve, the same thing happens—we get stressed, try random things, and get more stressed when they don’t work.
Even when simply pausing to think deeply about it for a set time would help, we don’t do this, because we’re averse to making cognitive efforts. A superagent may systematically enumerate possible starting points according to heuristics, pick one that looks good, and start; and if they hit a wall, they may quickly pause to think through and solve the problem, as we could, if our cognitive efforts were frictionless.
Principle 8 (Deduction)
A superagent may deduce facts from large amounts of seemingly unrelated data.
A generic fact about the internal state of some system A is constantly being broadcast in innumerable subtle ways through A’s interactions with other systems. Whenever these other systems are changed by such interactions in ways that depend on this internal state in known ways, and those changes are observable, they form channels through which A’s internal state can be probabilistically inferred. The fact that someone didn’t reply to an email for 24 hours, for instance, is probabilistically informative of their internal state insofar as there are some possible internal states of theirs that make this more likely to happen than others—which there are. It’s not much evidence, but it is at least some evidence, and there are so many different side-channels that are all simultaneously leaking at least some evidence concerning questions of interest to us.
Humans sometimes pick up on these things, but to a very small extent. We can notice that someone is angry by their facial expression or tone of voice, but we can’t simultaneously process and cross-reference everything they’ve said and done, all their subtle shifts in behavior over time, and correlate those with their known goals and incentives to produce a high-resolution picture of their internal state. But if you have access to enough such data, and good models of how the state of A affects these observable channels, you could in principle infer this internal state with high confidence, as a matter of consistent (read: computerized) tracking and calculation. I claim that we are very often given the data required to figure out so many of our most important questions, and we have the theories of probabilistic inference required to reconstruct the answers to these questions from our observations—but we don't, since the data is so scattered and subtle, and we don’t have the consistency and unity of will required to collect and process it.
But a superagent capable of collecting and processing all of the tiny morsels of data given off by a system of interest may be able to deduce these humanly-inaccessible facts about the system’s internal state, as if by ESP or magic—though it’s really just mathematics and consistency. In general, this should be achievable when some possible internal states are vastly more compatible with the data in aggregate than other possible states; a superagent may develop a much more explicit model of the state space than humans naturally do, and use each bit of data it obtains about the system—each output of the system that could’ve turned out differently—to locate its internal state much more efficiently.
Principle 9 (Experimentation)
A superagent may perform actions designed to subtly influence and gain information about a system, or several systems at once.
This is an extension of the previous principle. When you can elicit information from a system—that is, you can perform an action that causes the system to respond in a way that predictably depends on its internal state—you have incredible room to optimize this action for informativeness (the entropy of your model’s prior over possible outputs of the acted-upon system) at the same time that you’re using it to alter the system’s state in a way that benefits you (not just by making it itself act in ways that directly benefit you, but by increasing the options you’ll have to influence it in the future, or by making it easier to elicit information about the system through future actions).
Insofar as a superagent can learn about a system from much subtler information than a human, they should also be able to act on the system in much subtler ways than a human in order to cause it to make changes to its environment that are predictably dependent upon (and therefore encode) the information they need. Because they can explicitly keep track of many more desiderata at once concerning their planned actions, they may optimize their actions for many more desiderata at once as well: these include the extent of influence and usefulness of information elicited from a system as a result of the action, as well as its ability to effect the preservation or expansion of future options for both influence and useful elicitation of information from that system. In complex environments with multiple parties to influence and learn about, humans often explicitly restrict themselves to binary interactions, interacting with one party at a time, since it’s difficult to reliably mentally process higher-order relations and interactions. A superagent that can expand its equivalent ‘mental workspace’ may not have such problems.
In practice, being on the receiving end of superagentic action may look like seeing claims that through certain choices of wording seem to almost connote certain interesting things, or actions that seem to rely on interesting implicit assumptions, that put you on the threshold of wanting to respond to certain things in certain ways, that appear as Schrodinger’s dogwhistles for a variety of topics that you have some emotional investment and personal experience in.
(Note that it will never really feel like “I’m being influenced to share my opinion about xyz”: it just feels like you really ought to tell them about xyz. Maybe because you want to refute some claim or implicit assumption they seem to have about xyz, or because you want to tell them something that critically informs some closely-held belief of yours that they appear to be undermining. This aspect of human psychology is what makes it so easy to troll people online: they never think “this person is making me angry by acting like they don’t understand xyz”, they just feel angry because this person is refusing to understand xyz).
As with the previous principle, this is totally possible for ordinary humans to do in theory, but in practice it relies on levels of effort, coordination, and precision we cannot reliably bring to our tasks.
Principle 10 (Meta-Agency)
A superagent may have an explicit conception of themselves as a system to be optimized, and a process for improving their own capabilities.
Humans rarely think about themselves as cognitive systems that can be refined to better achieve their given ends. Certain methods of improving our memory, motivation, thought patterns, etc. might come to us every once in a while, but it’s rare that we make systematic instrumental efforts to improve these things.
A superagent, not being entirely human, is not subject to the same psychological biases and hang-ups that divert us from discovering and executing cognitive self-improvements, and their structure may admit much clearer, more robust avenues for improvement than ours. They may autonomously identify bottlenecks and inefficiencies, model the effects of different interventions on their own performance, simulate different versions of themselves, and then actually execute on knowledge thereby gained in order to make systematic, instrumental self-improvements.
Humans vary widely with respect to, and can explicitly improve at, most of these capabilities; some rare individuals (e.g. psychopaths) may fully exemplify a few of them. Large groups of coordinated humans often act superagentically in many of these ways simultaneously—no one human, or group of uncoordinated humans, could match the agentic capacity of Google or the Mossad. But coordination with other humans is slow and expensive, and can only go so far so long as all the thinking of these groups has to happen among human minds. Hybrid human-AI systems will likely enable individuals to approach superagency in many more of these ways.
Prior to the era of superintelligent actors, we’re likely to see a brief era of superagentic actors—actors who are capable of setting and achieving goals in the pursuit of a given end with significantly greater efficiency and reliability than any single human. Superagents may in certain restricted senses act superintelligently—see principles 8, 9—but this isn’t strictly necessary. A superagent may be constructed from a well-scaffolded cluster of artificial intelligences, but in the near-term it’s far more likely that superagents will consist of one or more humans, aided by well-scaffolded AIs, since humans still have a few properties vital to agency that AIs haven’t fully acquired (yet).
As with ‘superintelligence’, there’s no canonical demarcation between superagentic actors and non-superagentic actors; there are only so many different properties which are likely to end up being strongly correlated at large scale, but which may end up uncoupled in particular cases (especially transitional ones), producing a jagged frontier of agency.
Here’s a list of possible properties by virtue of which an actor may achieve superagency.
Principle 1 (Directedness)
A superagent may have vastly improved self-monitoring, introspection, and control.
In most intellectual tasks, humans spend the overwhelming majority of their time in predictably unproductive patterns: they are caught up in minutiae, overpolishing what ought to be discarded, failing to filter distractors. They generally fail to notice, or are unwilling to acknowledge, when they’re taking the wrong direction entirely even when they could easily recognize this, and are resistant to change once they’ve invested a lot of their time or ego in a particular approach. Even though they can often easily diagnose these mistakes when other people are making them, they can’t easily avoid these mistakes themselves.
A superagent, on the other hand, may be able to plot a reasonable route to their goal and directly take it without distractions, quickly noticing and correcting unproductive patterns and directions.
Principle 2 (Alignment)
A superagent may consistently keep track of whether its behaviors are aligned towards and optimal for a given end.
Humans rarely step back from their efforts to orient themselves, and don’t systematically ask themselves key questions concerning alignment:
The ability to answer such questions consistently (and recursively, up the chain of “why”s) is very rare, often for ego-based or otherwise emotional reasons—one is unwilling to find out that their pet project is actually unimportant, or that they should scrap their present efforts—and comes unnaturally to humans, who hardly do this in their lives.
Example: Protests are almost never the most useful way to spend x units of time, effort, or resources on a given cause, and this is obvious—but people do it anyway, because they’re conditioned to think that’s the sort of thing that you should do when you strongly support a cause. We follow these culturally autocompleted behavioral patterns, in every part of their lives, and cannot reliably step back to explicitly think about optimizing our actions for their given ends. But all it would take to fix this is correct (programmatic) prompting and consistent habits.
Principle 3 (Uninhibitedness)
A superagent may not be restricted by felt senses of courtesy, fairness, guilt, or honor.
Almost all humans have ingrained social norms / cultural patterns that encourage them to treat other humans as ends in themselves, and to avoid harming others, especially in ways that would look dishonorable or unfair. Most of us hesitate to take, and when taking, actions that violate social norms or feel “illegitimate”. Sending cold emails, pirating software and media, steroid use, social media sockpuppeting, lying on forms, and many other easy and beneficial actions are inhibited this way. We justify these inhibitions by cognitions like “what if I get caught?”, or vague appeals to notions of reputation and trust, ignoring our ability to rationally assess the actual risks of getting caught, losing reputation, or losing trust (which are often low, or easily mitigated).
To a superagent, humans and their institutions may be treated as any other system, with assessable inputs and outputs, and managed costs and risks; concepts like “fairness” and “courtesy” are only relevant insofar as they’re constructs within the system that can affect expected outcomes. Appearing discourteous can have repercussions. But to be discourteous is a meaningless concept outside of the social context that humans alone cannot escape.
This doesn’t necessarily make the superagent dangerous; its goals or restraints may respect things like prosociality, and it may pay this respect more efficiently and effectively than ordinary human agents. It just may not be constrained by an internalized superego in the way that humans are.
Example: Many humans have to fight against themselves to send cold emails to potential collaborators. They might feel it’s forward or presumptuous, agonize over the phrasing of a rough draft for hours, and then decide not to send it at all. Five minutes of cognitive effort, directed well, would produce a better result.
A superagent may direct their cognitive efforts that well—for everything—and simply not hesitate to do things for social reasons as humans do. Are there ten other important people that could just as easily be sent slight variants on this cold email? (Yes). Are they all sitting together and comparing their cold emails in order to ignore anyone who sends too many? (Almost always no; where plausible, the risk has already been assessed and priced in). Then ten slight variants may be drafted and sent with five additional minutes of effort, not ten additional hours of nervous self-inhibition.
(If this principle seems to entail psychopathy, that’s because it does. If you’re thinking that agency ought to be an unqualified good, or that ‘superagentic’ ought to be a compliment, that’s your problem).
Principle 4 (Foresight)
A superagent may not make foreseeable mistakes.
When humans make mistakes, it’s often because we didn’t spend enough cognitive resources thinking through the potential failure modes of our actions. Maybe we have a narrow focus and miss the bigger picture, or fail to weigh the perspectives of people with different interests. We fail to perform obvious sanity checks, or have mental blocks that prevent us from seeing a problem that’s directly in front of us.
So we make foreseeable mistakes -- or, mistakes which, at the time of making them, could have been avoided by thinking a few steps ahead, or running a quick check. Things like
In general, whenever a mistake causes us to say “I should’ve seen that coming, that was preventable, ...”, a superagent may be designed that does see it coming, and does prevent it. They may still make novel mistakes, mistakes because the world has hidden structure or causality that they didn’t know about; they may also make mistakes when embedded in large, complex systems whose behavior is just not tractably predictable. But superhuman ability at foreseeing and avoiding foreseeable and avoidable obstacles seems readily achievable. This may be because the agent has a constant background process of looking for potential errors, or runs simulations of adversarial critics to find flaws in its own plan before it executes it (wargaming against intelligent and stochastic red teams), or builds internal infrastructure and carefully-followed checklists to make errors impossible, or so on.
Principle 5 (Parallelization)
A superagent may have vastly improved ability to manage and coordinate across many threads of attention.
Humans are limited to a single thread of serial cognition. We can juggle a few things in the background, but at best we can only have one continuous line of reasoning at a time, and we parallelize poorly due to our inability to effectively context switch. As the number of tasks we have to juggle grows, the cognitive load quickly exceeds our capacity.
A superagent may be able to maintain many threads of cognition, each processing a different subproblem, potentially delegating tasks to subagents, as in a distributed system. It may spawn off a subagent to do some data entry, another to draft a response to a letter, another to plan a schedule, another to work on a technical problem, and then monitor all of these virtual threads. Alternatively, a superagent may just have the ability to process multiple thought streams in parallel, though the multithreading approach is more compatible with the present architectures from which superagents might be built. There are situations where humans can natively parallelize, to some small extent—we can walk and talk at the same time, usually—but we can’t simultaneously read, listen, talk, and type different things. We have one face and two hands, but there’s no reason an agent can’t have thousands of faces and thousands of hands with which it speaks, listens, and acts simultaneously.
Principle 6 (Planning)
A superagent may have the capacity for extremely large-scale, long-term planning.
To humans, distance through time and space have an appreciable weight to them; they make things feel hazy and unclear. We find it hard to conceptualize time spans more than a few years, and more generally conceptualize the future as a sort of abstract situation: we hold its possible states in our mind as “long-term outcomes” to be hoped for or avoided, not as concrete things we can influence through present actions. We don’t naturally have the ability to create and execute plans over long time horizons.
Long-term planning is especially difficult if the steps depend on information that we don’t yet have (e.g., the outcome of a process that takes place a few months down the line), even if we can clearly articulate in advance which actions we would take upon each outcome, since we get overwhelmed by the complexity of reasoning over possible decision trees instead of individual decisions—even when we can use tools like whiteboards to keep from having to hold the entire tree in our head at once.
A superagent may treat the future with the same clarity of attention as the present. They may naturally think about the future as something concrete, something they can directly (if probabilistically) manipulate, and easily create multistep plans that extend years into the future. If they have a goal that is distant in time, they can just start working on it now, lacking any feeling of its being “very far away” to distract them. They may readily perform preliminary tasks today that they expect to free up additional options in a certain decision several months from now, since the only real difference between today and any given future date is that one can be seen more clearly.
Example: We notably use proximity as a crutch for our attention. If I need to make some slides for a conference two months away, I’ll always—as though it were a law of physics, always—wait until the week before to start preparing, even though I could prepare it at any time. We justify this indecision with cognitions like “but things might change; my influence over the future is weaker, so I shouldn’t act on it just yet” which are more often posited as excuses to give up than as issues that can be analyzed and mitigated. (Do chess grandmasters whine about how there’s no point in long-term calculations because anything could happen in the next twenty moves? No. They calculate what they can, prepare alternatives for different contingencies, and take actions to open up or secure certain future movement options).
Principle 7 (Flow)
A superagent may have vastly reduced transaction costs to thinking and acting.
Humans tend to have an inertia to cognition that makes us hesitant to think through any particular question that we can subsume under a cached conclusion for a general case, and diverts us from mentally ‘shopping around’ for alternative solutions, once our cognition has secured for us a first solution that we’re with. We underperform at tasks that we don’t like, or consider ourselves good at, since we conceptually identify ourselves as confused, some task as beyond our abilities, etc., rendering ourselves inefficient or slow at tasks we could just do right if our mindset were corrected.
Nor can we work indefinitely—we get maybe a few hours per day of peak intellectual performance—or context switch effectively; if interrupted, we take a while to become performant at our original task again. We need lots of breaks, and generally have to delegate a finite and rather small amount of willpower per day as though it were a currency.
A superagent may not have questions of willpower, and may be in a permanent flow state; they may decide upon what needs to be accomplished, and perform the required steps to accomplish it with the uninterrupted automaticity of clockwork, while remaining open to alternative solutions. A sole human that perfectly possessed this property would still need x hours of sleep a night, would still act more sluggish when they’re jet-lagged or have low blood sugar, etc., but would never decide to eat a chocolate bar or play a video game, unless circumstances rationally incentivized such actions. These wouldn’t be actions that they have to mentally resist; they’re just understood to be pointless. More generally, mental moves that ordinarily challenge humans, like changing to a new, alternative strategy even after spending lots of effort on a current strategy, could similarly be understood as optimal and performed without mental resistance.
Example: When we have a complicated task in front of us, we often do a quick mental scan for a starting point, and, if no obvious one comes to mind, we get overwhelmed. Then we start going in circles, trying random things, or give up. If we run into an obstacle that our intuition doesn’t immediately tell us how to solve, the same thing happens—we get stressed, try random things, and get more stressed when they don’t work.
Even when simply pausing to think deeply about it for a set time would help, we don’t do this, because we’re averse to making cognitive efforts. A superagent may systematically enumerate possible starting points according to heuristics, pick one that looks good, and start; and if they hit a wall, they may quickly pause to think through and solve the problem, as we could, if our cognitive efforts were frictionless.
Principle 8 (Deduction)
A superagent may deduce facts from large amounts of seemingly unrelated data.
A generic fact about the internal state of some system
Ais constantly being broadcast in innumerable subtle ways throughA’s interactions with other systems. Whenever these other systems are changed by such interactions in ways that depend on this internal state in known ways, and those changes are observable, they form channels through whichA’s internal state can be probabilistically inferred. The fact that someone didn’t reply to an email for 24 hours, for instance, is probabilistically informative of their internal state insofar as there are some possible internal states of theirs that make this more likely to happen than others—which there are. It’s not much evidence, but it is at least some evidence, and there are so many different side-channels that are all simultaneously leaking at least some evidence concerning questions of interest to us.Humans sometimes pick up on these things, but to a very small extent. We can notice that someone is angry by their facial expression or tone of voice, but we can’t simultaneously process and cross-reference everything they’ve said and done, all their subtle shifts in behavior over time, and correlate those with their known goals and incentives to produce a high-resolution picture of their internal state. But if you have access to enough such data, and good models of how the state of
Aaffects these observable channels, you could in principle infer this internal state with high confidence, as a matter of consistent (read: computerized) tracking and calculation. I claim that we are very often given the data required to figure out so many of our most important questions, and we have the theories of probabilistic inference required to reconstruct the answers to these questions from our observations—but we don't, since the data is so scattered and subtle, and we don’t have the consistency and unity of will required to collect and process it.But a superagent capable of collecting and processing all of the tiny morsels of data given off by a system of interest may be able to deduce these humanly-inaccessible facts about the system’s internal state, as if by ESP or magic—though it’s really just mathematics and consistency. In general, this should be achievable when some possible internal states are vastly more compatible with the data in aggregate than other possible states; a superagent may develop a much more explicit model of the state space than humans naturally do, and use each bit of data it obtains about the system—each output of the system that could’ve turned out differently—to locate its internal state much more efficiently.
Principle 9 (Experimentation)
A superagent may perform actions designed to subtly influence and gain information about a system, or several systems at once.
This is an extension of the previous principle. When you can elicit information from a system—that is, you can perform an action that causes the system to respond in a way that predictably depends on its internal state—you have incredible room to optimize this action for informativeness (the entropy of your model’s prior over possible outputs of the acted-upon system) at the same time that you’re using it to alter the system’s state in a way that benefits you (not just by making it itself act in ways that directly benefit you, but by increasing the options you’ll have to influence it in the future, or by making it easier to elicit information about the system through future actions).
Insofar as a superagent can learn about a system from much subtler information than a human, they should also be able to act on the system in much subtler ways than a human in order to cause it to make changes to its environment that are predictably dependent upon (and therefore encode) the information they need. Because they can explicitly keep track of many more desiderata at once concerning their planned actions, they may optimize their actions for many more desiderata at once as well: these include the extent of influence and usefulness of information elicited from a system as a result of the action, as well as its ability to effect the preservation or expansion of future options for both influence and useful elicitation of information from that system. In complex environments with multiple parties to influence and learn about, humans often explicitly restrict themselves to binary interactions, interacting with one party at a time, since it’s difficult to reliably mentally process higher-order relations and interactions. A superagent that can expand its equivalent ‘mental workspace’ may not have such problems.
In practice, being on the receiving end of superagentic action may look like seeing claims that through certain choices of wording seem to almost connote certain interesting things, or actions that seem to rely on interesting implicit assumptions, that put you on the threshold of wanting to respond to certain things in certain ways, that appear as Schrodinger’s dogwhistles for a variety of topics that you have some emotional investment and personal experience in.
(Note that it will never really feel like “I’m being influenced to share my opinion about xyz”: it just feels like you really ought to tell them about xyz. Maybe because you want to refute some claim or implicit assumption they seem to have about xyz, or because you want to tell them something that critically informs some closely-held belief of yours that they appear to be undermining. This aspect of human psychology is what makes it so easy to troll people online: they never think “this person is making me angry by acting like they don’t understand xyz”, they just feel angry because this person is refusing to understand xyz).
As with the previous principle, this is totally possible for ordinary humans to do in theory, but in practice it relies on levels of effort, coordination, and precision we cannot reliably bring to our tasks.
Principle 10 (Meta-Agency)
A superagent may have an explicit conception of themselves as a system to be optimized, and a process for improving their own capabilities.
Humans rarely think about themselves as cognitive systems that can be refined to better achieve their given ends. Certain methods of improving our memory, motivation, thought patterns, etc. might come to us every once in a while, but it’s rare that we make systematic instrumental efforts to improve these things.
A superagent, not being entirely human, is not subject to the same psychological biases and hang-ups that divert us from discovering and executing cognitive self-improvements, and their structure may admit much clearer, more robust avenues for improvement than ours. They may autonomously identify bottlenecks and inefficiencies, model the effects of different interventions on their own performance, simulate different versions of themselves, and then actually execute on knowledge thereby gained in order to make systematic, instrumental self-improvements.
Humans vary widely with respect to, and can explicitly improve at, most of these capabilities; some rare individuals (e.g. psychopaths) may fully exemplify a few of them. Large groups of coordinated humans often act superagentically in many of these ways simultaneously—no one human, or group of uncoordinated humans, could match the agentic capacity of Google or the Mossad. But coordination with other humans is slow and expensive, and can only go so far so long as all the thinking of these groups has to happen among human minds. Hybrid human-AI systems will likely enable individuals to approach superagency in many more of these ways.
(also posted on Substack)