This is the third and final post in a sequence on control theory. In the first post I introduced the subject of control theory and stepped through some basics. In the second post I outlined Powers's model, as presented in Behavior: The Control of Perception. This post is a collection of comments on the subject that are only somewhat related, and so I'll use section headings to separate them. I'll also explicitly note the absence of a section on the design of control systems, which is where most of the effort used in talking about them in industrial settings goes, and is probably relevant to philosophical discussions surrounding them.


From Wikipedia's Cybernetics page:

Artificial intelligence (AI) was founded as a distinct discipline at a 1956 conference. After some uneasy coexistence, AI gained funding and prominence. Consequently, cybernetic sciences such as the study of neural networks were downplayed; the discipline shifted into the world of social sciences and therapy.

I'm no historian of science, and it's not clear to me why this split happened. It seems likely that control theory was simply not a useful approach for many of the early problems researchers associated with AI, like natural language processing: Powers has a description of how neural circuits as he models them could solve the phoneme parsing problem (which seems very compatible with sophisticated approaches that use Hidden Markov Models) but how one would go from parsing sounds to make words to parsing words to make concepts is not quite clear. It seems like there might be some difference in kind between the required circuitry, but perhaps not: one of the recent advances in machine learning is "deep learning," which the ultra-simplified explanation of is "neural nets, just dialed up to 11." It seems possible (certain, if you count NNs as a 'cybernetic' thing) that AI is moving back in the direction of cybernetics/control theory/etc., but possibly without much intellectual continuity. Did backpropagation spread from controls to AI, or was it independently invented? As mentioned before, I'm not a historian of science. People working in robotics, as my limited understanding goes, have always maintained a connection to engineering and cybernetics and so on, but the 'hardware' and 'software' fields diverged, where the roboticists sought to move from the first level up and the AI researchers sometimes sought to move from the top down, perhaps without the hierarchical view.

This article on Walter Pitts (an important early figure in cybernetics) describes the split thus:

Von Neumann was the first to see the problem. He expressed his concern to Wiener in a letter that anticipated the coming split between artificial intelligence on one side and neuroscience on the other. “After the great positive contribution of Turing-cum-Pitts-and-McCulloch is assimilated,” he wrote, “the situation is rather worse than better than before. Indeed these authors have demonstrated in absolute and hopeless generality that anything and everything … can be done by an appropriate mechanism, and specifically by a neural mechanism—and that even one, definite mechanism can be ‘universal.’ Inverting the argument: Nothing that we may know or learn about the functioning of the organism can give, without ‘microscopic,’ cytological work any clues regarding the further details of the neural mechanism.”

Utility Theory

Utility theory is the mathematically correct way to behave in an uncertain world if you have preferences over consequences that can be collapsed onto the real line and can solve the maximization problem. So long as your values follow four desirable rules, that describes your preferences. If we express those preferences as a probabilistically relevant score, then we can entirely separate our module that expresses preferences over consequences and our module that predicts probabilities of consequences once we take a particular action, and this is a huge boon to mathematical decision-making.

But it turns out that decision-making under certainty can often be a hard problem for humans. This is a black mark for the descriptive application of utility theory to humans, but is explained by the control theory paradigm as multiple goals (i.e. control systems) conflicting. I don't see this as a challenge to the prescriptive usefulness of utility theory: when presented with a choice, it is often better to make one than not make one--or, if one is delaying until additional information arrives, to know exactly what impact possible information could impact the decision through a VoI calculation. Even if you've identified the two terminal goals that are conflicting, it is probably better to explicitly short circuit one of those desires, determine the right tradeoff, or pull in a third direction rather than remain locked in conflict.

It also seems that utility maximization is mostly appropriate for an agent in the LW sense--a unitary entity that has a single preference function and plans how to achieve that (potentially complex) preference as well as possible. This requires a potentially immense amount of computing power, and it's not at all obvious that many of the "systems" with "intelligence" that we might be worried about will be described that way. When we look at, say, trading algorithms causing problems in the financial markets, utility maximization doesn't appear to be the right viewpoint for understanding why those algorithms behave the way they do, and game theory doesn't seem like the right approach to try to determine the correct reaction to their algorithms and their use.

It may also be helpful to separate out "strong" and "weak" senses in which an agent maximizes utility. The strong sense is that they actually have a known function that they use to value consequences, and simulate the future to determine the action that gets them the most value. The weak sense is that we can describe any agent as behaving as though it is maximizing some utility function, by observing what it does and calling that the utility-maximizing action. As the names suggest, the strong sense is useful for predicting how an agent will behave, and the weak sense isn't.

As mentioned earlier, I don't think it's easy (or desirable) to dethrone utility as the premier prescriptive decision-making approach, if you have the self-editing ability to change your decision-making style and the computing power to solve the maximization problems it poses. But we may need to figure out where we're coming from to figure out how to get there. (In some sense, that's the premise of the Heuristics and Biases literature.)

Previous Discussion on LW

It's not quite fair or reasonable to respond to comments and posts made years ago (and before I even found LW), especially in light of Yvain's roundup that had PCT listed with ideas that seemed outlandish before being partly absorbed into the LW consensus. One of the reasons why I bring the subject up again, with a longer treatment, is because I think I see a specific hole in the LW consensus that I might be well-suited to fill. So let's look at the list of links from the first post again: this book and Powers's Perceptual Control Theory have been discussed on LW herehere, and here, as well as mentioned in Yvain's roundup of 5 years (and a week) of LW.

I feel like the primary criticisms (see SilasBarta's comment as an example) were about the presentation and the unexplained enthusiasm, rather than the invalidity or inappropriateness of the model, and the primary defenses were enthusiasm (as I recall, this comment by pjeby prompted me to buy and read the book, but I'm only familiar with one of the six things that he says it explains, which impairs my ability to understand why he thinks it's impressive!). I don't mean to fault people involved in that conversation on the PCT side for not explaining- even I see my two thousand words in the last post as an argument to read Powers' three hundred page long book rather than a full explanation (just like my two thousand words spent on the basics of controls wouldn't get you through an undergraduate level class on the subject, and are more of an argument to take that class).

Since you can make an arbitrary function out of enough control loops, saying that human minds run on control loops doesn't constrain the possible behavior of humans much by itself, just like saying that a program is written in a high-level language doesn't constrain the possible behavior of that program much. I view PCT as describing the inherent modularity of the code, rather than what's possible to code, which helps quite a bit in figuring out how the code functions and where bugs might be hiding or how to edit it. Any model built in the controls framework will have to be very complicated to be fully functional--I feel like it's easier to understand the mechanics of a person's arm than the mechanics of their personality, but if we want to explain the arm at any meaningful level of detail we need a meaningful number of details!

And, in terms of models, I think the way to think about PCT is as a competitor for utility. I don't know many LWers who model themselves or other humans as utility maximizers, but it seems like that remains the default model for describing intelligent agents whenever we step up a level of abstraction (like when, say, we start talking about ethics or meta-ethics). As part of writing this post, I reread Yvain's sequence on the Blue-Minimizing Robot. At parts, it seems to me to present a dilemma between either modeling intelligence as utility-optimization or arbitrary code, where the former can't be implemented and the latter can't be generalized. A control system framework seems like it finds a middle ground that can be both feasibly implemented and generalized. (It pays for that, of course, by not being easily implemented or easily generalized. Roboticists are finding that walking is hard, and that's only level 5 in the hierarchy! On the input side, computer vision folks don't seem to be doing all that much better.)

Ideally, this is where I would exhibit some example that demonstrates the utility of thinking this way: an ethical problem that utilitarianism can't answer well but a control theory approach can, or a self-help or educational problem that other methods couldn't resolve and this method can. But I don't have such an example ready to go, I'm not convinced that such an example even exists, and even if one exists and I have it, it's not clear to me that it would be convincing to others. Perhaps the closest thing I have to an example is my experience training in the Alexander Technique, which I see as being easy to describe from a control theory perspective, but is highly related to my internal experience and methodology of moving through the world, both of which are difficult to describe through a text-based medium. Further, even if it does become obvious that positive change is taking place, determining how much that positive change validates a control system-based explanation of what's going on underneath is it's own difficult task!

A model fits the situation when easy problems are easy in the model and hard problems are hard in the model. A thermostat is simple, and a person is complex . The utilitarian approach says "find the thing defined as the thing to be maximized, and then maximize it," and in some sense the utilitarian model for a thermostat is 'as simple' as the utilitarian model for a person- they've swept all the hard bits into the utility function, and give little guidance on how to actually go about finding that function. The control system approach says "find the negative feedback loops, and edit them or their reference levels so they do what you want them to do," and the number of feedback loops involved in the thermostat is rightly far lower than the number of feedback loops involved in the person. If I could exhibit a simple model that solves a complex problem, then it seems to me that my model doesn't quite fit.1

Intelligent Action without Environmental Simulation

This is mostly covered by RichardKennaway's post here, but is important enough to repeat. Typically, when we think about optimization, we have some solution space (say, possible actions to take) and some objective function (over solutions, i.e. actions), and go through the solution space applying the objective function to points until we're satisfied that we have a point that's good enough. (If the relationship between the solution space and objective function is kind enough, we'll actually search for a proof that no better solutions exist than the one we picked.)

A common mathematical approach is to model the world as having states, with the transition probability from state to state depending on the actions the robot takes (see Markov Decision Processes). Typically, we want to find an optimal policy, i.e. a mapping from states of the world to actions that lead to the maximum possible accrual of value.

But the computational cost of modeling reality in that level of depth may not be worth it. To give a concrete example, there's a human movement neuroscience community that studies the questions of how muscles and joints and brains work (i.e. fleshing out the first five levels of the model we talked about in the last post), and one of the basic ideas in that field is that there's a well-behaved function that maps from the position of the joints in the arm to where the tip of the finger is. Suppose you want to press a button with your finger. You now have to solve the inverse problem, where I give you a position for the tip of the finger (where the button is) and you figure out what position to put the joints in so that you touch the button. Even harder, you want to find the change in joint positions that represents the least amount of effort. This is a computationally hard problem, and one of the questions the community is debating is how the human nervous system solves this hard problem.

My favorite answer is "it doesn't solve the hard problem." (Why would it? Effort spent in the muscles is measured in calories, and so is effort spent in the nerves.) Instead of actually inverting the function and picking the best possible action out of all actions, there might be either stored approaches in memory or the brain might do some sort of gradient descent (easily implemented in control systems using the structure described in the last post), where the brain knows the difference between where the finger is and where it should be, moves each joint in a way that'll bring the finger closer to where it should be, and then corrects its approach as it gets closer. This path is not guaranteed to be globally optimal, i.e. it does not solve the hard problem, but is locally optimal in muscular effort and probably optimal in combined muscular and nervous calorie expenditure.

Preference Modeling and Conflicts

I'm not familiar with enough Internal Family Systems Therapy to speak to how closely it aligns with the control systems view, but I get the impression that the two share many deep similarities.

But it seems to me that if one hopes to preserve human values, it would help to work in the value space that humans have- and we can easily imagine control systems that compare the relative position or rate of change of various factors to some references. I recall a conversation with another rationalist about the end of Mass Effect 3, where (if I recall their position correctly, and it's been years so I'm not very confident that I do) they preferred a galactic restart to a stagnant 'maximally happy' galaxy, because the former offered opportunities for growth and the latter did not, and they saw life without growth as not worth living. From a utility maximization or efficiency point of view, this seems strange- why want the present to be worse than it could be? But this is a common preference (that shows up in the Heuristics and Biases literature), that people often prefer an increasing series of payments to a decreasing series of payments, even though by exponential discounting they should prefer the decreasing series (where you get the largest payment first) if the amounts are the same with only the order reversed.

Reasoning About AI

I don't see all that much relevance to solving the general problem of value preservation (it seems way easier to prove results about utility functions), but as mentioned in the conflicts section it does seems relevant to human value preservation if it's a good description of human values. There is the obvious caveat that we might not want to preserve the fragmentary shattering of human values; a potential future person who wants the same things at the same strength as we do, but has their desires unified into a single introspectively accessible function with known tradeoffs between all values, will definitely be more efficient than current humans--potentially more human than the humans! But when I ask myself for a fictional example of immediate, unswerving confidence in one's values, the best example that comes to mind is the Randian hero (which is perhaps an argument for keeping the fragmentation around). As Roark says to Peter (emphasis mine):

If you want my advice, Peter, you've made a mistake already. By asking me. By asking anyone. Never ask people. Not about your work. Don't you know what you want? How can you stand it, not to know?

But leaving aside values, there's the question of predicting behavior. It seems to me that there are two facets--what sort of intensity of change we would expect, and how we should organize predictions of the future. It seems likely that local controllers will have, generally, local effects. I recall a conversation, many years ago, where another suggested that an AI in charge of illuminating the streets might decide to destroy the sun in order to prevent the street from being too bright. Or, I suggested, it might invest in some umbrellas, since that's a much more proportionate response. I italicize proportionate because under a linear negative feedback controller that would be literally true- the more lumens between the sensor and the reference, the more control effort would be expended, in a one to one fashion. Controllers are a third class of intentional agents, different from both satisficers and maximizers, and a potentially useful one to have in one's mental toolkit.

If we know a system is an intelligent optimizer, and we have a range of possible futures that we're trying to estimate the probability of, we can expect that futures higher in the preference ordering of the optimizer are more likely. But if we also have an idea of what actuators the system has, we might expect futures where those actuators are used, or the direct effect of those actuators leads to a higher preference ordering, to be more likely, and this might be a better way for reasoning about those problems. I'm not sure how far I would take this argument, though; many examples abound of genetic algorithms and other metaheuristic optimization methods being clever in surprising ways, using their ability to simulate the future to find areas of the solution space that did not look especially promising to their human creators that turned out to have proverbial gold. It seems likely that superhuman intelligence is likely to rely heavily on numerical optimization, and even if the objective function is determined by control systems,2 as soon as optimizers are in the mix (perhaps as determining what control to apply to reduce an error) it makes sense to break out the conservative assumptions on their power. And actuators that might seem simple, like sending plaintext through a cable to be read by people, are often in fact very complex.

1. gwern recently posted an Asimov essay called Forget It!, which discusses how an arithmetic textbook from 1797 managed to require over 500 pages to teach the subject. One might compare today's simple arithmetic model to their complex arithmetic model, apply my argument in this paragraph, and say "but if you've managed to explain a long subject in a short amount of time, clearly you've lost a lot of the inherent complexity of the subject!" I'd counter with Asimov's counter, that the arithmetic of today really is simpler than the arithmetic they were doing then, and that the difference is not so much that the models of today are better, but that the reality is simpler today and thus simpler models suffice for simpler problems. But perhaps this is a dodge because it depends on the definitions of "model" and "reality" that I'm using.

2. A few years ago, I saw an optimization algorithm that designed the targeting of an ion beam (I believe?) used to deliver maximum radiation to a tumor while delivering minimum radiation to the surrounding tissue. The human-readable output was a dose probability curve, basically showing the radiation distribution that the tumor received and that the surrounding tissue received. The doctor would look at the curve, decide whether or not they liked it, and play with the meta-parameters of the optimization until the optimizer spat out dosage distributions that they were happy with. I thought this was terribly inefficient- even if the doctors thought they were optimizing a complex function of the distribution, they were probably doing something simple and easy to learn like area under the curve in particular regions or a simple integration, and then that could be optimized directly. The presenter disagreed, though I suspect they might have been disagreeing on the practicality of getting doctors to accept such a system rather than building one. As the fable goes, "instant" cake mix requires that the customer break an egg because customers prefer to do at least one thing as part of making the cake.

New Comment
15 comments, sorted by Click to highlight new comments since:

It is 11:30 by my clock, so published on Wednesday, though perhaps not at the time I had in mind ;)

Thanks again to Ari Rabkin, Peter McCluskey, Christian Kleineidam, Carlos Serrano, Daniel Light, Harsh Pareek, and others for helpful comments on drafts.

An amusing typo I discovered while proofreading one of the new parts of this post: one sentence originally read "when presented with a choice, it is often better to make out than not make one." Words to live by.

Another typo: first paragraph, “effect used” → “effort used”.

Fixed, thanks!

FWIW, my enthusiasm over PCT has cooled considerably. Not because it's not true, just because it's gone from "OMG this explains everything" to just "how things work". It's a useful intuition pump for lots of things, not the least of which is the reason humans are primarily satisficers, and make pretty crappy maximizers. (To maximize, we generally need external positive feedback loops, like competition.)

(It's also a useful tool for understanding the difference between what's intuitive to a human and intuitive to an AI. When you tell a human, "solve this problem", they implicitly leave all their mental thermostats to "and don't change anything else". Whereas a generic planning API that's not based on a homeostatic control model implicitly considers everything up for grabs, the human has a hererarchy of controlled values that are always being kept within acceptable parameters, so we don't e.g. go around murdering people to use their body parts for computronium to solve the problem with. This behavioral difference falls naturally out of the PCT model.)

At the same time, I have seen that not everything is a negative-feedback control loop, despite the prevalence of them. Sometimes, a human is only one part of a larger set of interactions, that can create either positive or negative feedback loops, even if individual humans are mostly composed of negative-feedback control loops.

Notably, for many biological processes, nature doesn't bother to evolve negative control loops for things that didn't need them in the ancestral environment, due to resource limitations or competition, etc. If this weren't true, superstimuli couldn't exist, because we'd experience error as the stimulus increased past the intended design range. And then we wouldn't e.g. get hooked on fast food.

That being said, here's an example of something "self-help useful" about the PCT model, that is not (AFAIK) predicted by any other psychological or neurological model: PCT says that a stable control system requires that higher level controls operate on longer time scales than lower ones. More precisely, a higher-level perceptual value must always a function of lower-level perceptions sampled over a longer time period than the one those lower-level perceptions are sampled on. (Which means, for example, that if you base your happiness on perceptions that are moment-to-moment rather than measured over longer periods, you're gonna have a bad time.)

Another idea, stated in a lot of "traditional" self-help, is that you can't get something until you can perceive what it is. Some schools treat this as a hierarchical process, and a few even treat this as a formalism, ie., that your goal is not well-formed until you can describe it in terms of what sensory evidence you would observe when the goal is reached. And even my own "desk-cleaning trick", developed before I learned about PCT, is built on a perceptual contrast.

And speaking of contrast, the skill of "mental contrasting" is all the rage these days, and it's also quite similar to what PCT says about perceptual contrast. (Not to mention being similar the desk-cleaning trick.)

However, there's a slight difference between what PCT would predict as optimal contrasting, and what "mental contrasting" is. I believe that PCT would emphasize contrasting not with anticipated difficulties, but rather, with whatever the current state of reality is. As it happens, Robert Fritz's books and creativity training workshops (developed, AFAICT independently of PCT) take this latter approach, and indeed the desk-cleaning trick was the result of me noticing that Fritz's approach could be applied in an instantaneous manner to something rather less creative than making art or a business. (Again, prior to PCT exposure on my part.)

I would be interested to see experiments comparing "mental contrasting" as currently taught, with "structural tension" as taught by Fritz and company. I suspect that they're not terribly different, though, because one byproduct of contrasting the goal state and current state is a sudden awareness of obstacles and/or required subgoals. So, being told to look for problems may in fact require people to implicitly perform this same comparison, and being told to do it the other way around might therefore only make a small difference.

[PCT]'s gone from "OMG this explains everything" to just "how things work".

This is high praise.

FWIW, my enthusiasm over PCT has cooled considerably. Not because it's not true, just because it's gone from "OMG this explains everything" to just "how things work".

I'm agreed with Kennaway on this.

It's a useful intuition pump for lots of things, not the least of which is the reason humans are primarily satisficers, and make pretty crappy maximizers.

Technically, I disagree, because I want 'satisficer' to keep the original intended sense of "get X to at least this particular threshold value, and then don't worry about getting it any higher." I think controls point at... something I don't have a good word for yet, but 'proportioners' that try to put in effort appropriate to the level of error.

(An aside: I was at the AAAI workshop on AI and Ethics yesterday, and someone shared the story of telling people about their simulated system which proved statements like "if a person is about to fall in the hole, and the robot can find a plan that saves them, then the person never falls into the hole," and had their previous audience respond to this with "well, why doesn't the robot try to save someone even if they know they won't succeed?". This is ridiculous in the 'maximizer' model and the 'satisficer' model, but makes sense in the 'proportioner' model--if something needs to be done, then you need to try, because the effort is more important than the effect.)

want 'satisficer' to keep the original intended sense of "get X to at least this particular threshold value, and then don't worry about getting it any higher." I think controls point at... something I don't have a good word for yet, but 'proportioners' that try to put in effort appropriate to the level of error.

And yet, that's what they do. I mean, get X to a threshold value. It's just that X is the "distance to desired value", and we're trying to reduce X rather than increase it. Where things get interesting is that the system is simultaneously doing this for a lot of different perceptions, like keeping effort expenditure proportionate to reward.

if something needs to be done, then you need to try, because the effort is more important than the effect.

I don't understand this. People put forth effort in such a situation for various reasons, such as:

  • Lack of absolute certainty the attempt will fail
  • Embarassment at not being seen to try
  • Belief they would be bad if they don't try

etc. It's not about "effort" or "effect" or maximizing or satisficing per se. It's just acting to reduce disturbances in current and predicted perceptions. Creating a new "proportioner" concept doesn't make sense to me, as there don't seem to be any leftover things to explain. It's enough to consider that living beings are simultaneously seeking homeostasis across a wide variety of present and predicted perceptual variables. (Including very abstract ones like "self-esteem" or "social status".)

Thinking about it more, maybe I should just use "controller" to point at what I want to point at, but the issue is that is a normal English word with many more implications than I want.

Creating a new "proportioner" concept doesn't make sense to me, as there don't seem to be any leftover things to explain.

Mathematically, there definitely is. That is, consider the following descriptions of one-dimensional systems (all of which are a bit too short to be formal, but I don't feel like doing all the TeX necessary to make it formal and pretty):

  1. max x s.t. x=f(u)

  2. min u s.t. x≥x_min, x=f(u)

  3. u=-k*e, e=x-x_ref, y=f(u,x)

The first is a maximizer that tries to get x as high as possible, the second is a lazy satisficer that tries to do as little as possible while getting x above some threshold (in general, a satisficer just cares about hitting the threshold and not effort spent), the third is a simple negative feedback controller and it behaves differently from the maximizer and from the satisficer (approaching the reference asymptotically, reducing the control effort as the disturbance decreases).

My suspicion is that typically, when people talk about satisficers, they have something closer to 3 than 2 in mind. That is...

It's just acting to reduce disturbances in current and predicted perceptions.

Agreed. But that's not what a satisficer does (in the original meaning of the term).

humans are primarily satisficers, and make pretty crappy maximizers

Is there some reason they should be maximisers?

And then we wouldn't e.g. get hooked on fast food.

"What do you mean by 'we', paleface?" :)

How do you explain why some people do not get hooked on fast food? To me, what McDonalds and similar places serve does not even count as food. It is simply not my inclination to eat such things. I don't play computer games much either, to name another "superstimulus". I do not click on any link entitled "10 things you must..." This isn't the wisdom of age; the same is true of my younger adult selves (mutatis mutandis -- some of those things had not been invented in those days).

Ok, that's just me, but it's an example I'm very familiar with, and it always feels odd to see people going on about superstimuli and losing weight and the ancestral environment and the latest pop sci fads and observe that I am mysteriously absent, despite not being any sort of alien in disguise.

I'm quite happy to see PCT is a real thing. I always had trouble explaining my own mental model of behavior in traditional psychological terms and now I only need to point to PCT.

What I am missing is a treatment of the formation of the control loops. For the lower levels it is quite clear: These evolved. But what about the higher levels. I don't think the whole hierarchy is fixed. It is fixed only on the lower levels (and I hear that even there is some plasticity in weights). The higher you get the more variable the control loops become. Sure there must be some main controls meshing in desires and values but how do these attach to the higher control loops? I mean it's not like the top level control is some one-dimensional reward channel controlling fixed control loops for very abstract behaviors. This is only partly addressed by the treatment of conflict which implies multiple high level control signals.

We can look at a particular person and behavior and try to describe the control loops involved. But that doesn't answer how these came about.

Consider habits. Apparently it is possible to establish very complex habits. Habits are basically control loops on a level above sequential actions. But the control loop comes into being without being pre-wired. It realizes some more or less successful behavior sequence that results in the satisfaction of some higher level control.

For example how does automatically locking the door when leaving the house come into being? Sure it results in a feeling of security which is a higher level control - but that is not the reason the behavior evolved. The behavior (the control loop causing the locking of the door) initially doesn't exist as such. It started off with the chaining of the individual acts. But the brain is good at finding patterns also in its own behaviors and I see this locking the door as an aggregate that is pattern matched against the feeling of security after locking the door and again confirmed upon finding the door locked on return thus reinforcing the control loop as a whole that didn't exist before.

Thus it appears to me that the complex mesh of behaviors may well be a deep hierachy of nested control loops but especially the higher levels of the hierachy are to a large degree ad-hoc instantiations of recognized patterns in ones own behaviors aquired earlier or later. Many primary behaviors originate during child development many of which are very strongly related to neccessary development that come into being in mostly comparable or at least recognizable form for most people. This surely resutls partly from the way some controls are pre-wired (hard-wired curiosity for certain stimuli surely causes lots of early parallel development; I can definitely confirm this from experience of my own children).

But I often see strange behaviors and I then I wonder how these strange control loops came into being and how one might modify them for everyones benefit. The hierarchy really is a big messy graph. Lots of local control loops working hard to reduce their error signal only to be abandoned (getting silent) when their applicability pattern doesn't match any more.

There can be lots of control loops active at a time (conflict) and the steering effect of one loop can cause another loop to become active. Depending on how sensitive these loops are to circumstances the result may be not a chain of successor loops (which could be picked up and become a larger pattern) but cause a random or chaotic sequence of action.

Looking back I see this in arguments in relationships a lot (my own included). In a pair one incompatble habit leads to a (delayed) reaction by the other, say a kind of dissatisfaction response which then leads to one of multiple counter responses (multiple because it is aversive and if one control loop fails higher level control suggests that others might). Some may work in some circumstances. Each may be followed by multiple return responses. If enough compatibility exists the joint system will ultimately converge back (OK, it will almost always converge because in the end joint exhaustion becomes the dominant (and joint) control error).

I think one dark art lurking here is to find the patterns people's control loops match on and use these. This is related to framing. People behave different in different contexts. PCT-wise this means different control loops are active. Change the context and the behavior will follow.

Two examples:

  • My brother-in-low once applied geek-fu to deflect a threatening guy by saying: "cool sticker on your jacket, where did you buy it?" thus totally changing the frame and immediately relaxing the guy.

  • I have often trouble to prevent my recalcitrant son from wreaking havoc. Sometimes I succeed in changing the frame by pointing to some new thing, retreating and playing an interesting game with his brothers, asking him a question totally unrelated to the situation but inherently interesting to him.

Ideally, this is where I would exhibit some example that demonstrates the utility of thinking this way: an ethical problem that utilitarianism can't answer well but a control theory approach can, or a self-help or educational problem that other methods couldn't resolve and this method can.

So I'm not entirely sure whether this is actually correct, and I could be entirely off, but could the control theory approach be relevant for problems like:

  1. If you have an unbounded utility function, it won't converge
  2. If you have a bounded utility function, you may consider a universe with (say) 10^18 tortured people to be equally bad as a universe with any higher number of tortured people
  3. Conversely, if you have a bounded utility function, you may consider a universe with (say) 10^18 units of positive utility to be equally good as a universe with any higher number of good things
  4. If you do have some clear specific goal (e.g. build a single paperclip factory), then after that goal has been fulfilled, you may keep building more paperclip factories just in case there was something wrong with the first factory, or your sense data is mistaken and you haven't actually built a factory, etc.

Intuitively it seems to me that the way that human goal-directed behavior works is by some mechanism bringing either desirable or undesirable things into our mental awareness, with the achievement or elimination of that thing then becoming the reference towards which feedback is applied. This kind of architecture might then help fix problems 2-3, in that if an AI becomes aware of there existing more bad things / there being the potential for more good things, it would begin to move towards fixing that, independent of how many other good things already existed. Problem 4 is trickier, but might be related to there some being set of criteria governing whether or not possibilities are brought into mental awareness.

Does this make sense?

Does this make sense?

This does look like a fruitful place to look, but one of the main problems here with demonstrating superiority is that the systems can emulate each other pretty well. Claims of superiority typically take the form of "X seems more intuitive" or "I can encode X in less space using this structure" rather than "X comes to a different, better conclusion." For example:

If you have a bounded utility function, you may consider a universe with (say) 10^18 tortured people to be equally bad as a universe with any higher number of tortured people

You can have asymptotic bounds that "mostly" solve this problem, or at least they solve this problem about as well as a controller would.

For example, suppose my utility based on the number of people that are alive is the logistic function (with x0 set to, say, 1,000 or 1,000,000). Then I will always prefer a world where X1 people are alive to a world where X2 people are alive iff X1>X2, but the utility is bounded above by 1, and has nice global properties.

Basically, it smooths together the "I would like more people to be alive" desire and the "I would like humanity to continue" desire in a continuous fashion, such that a 50-50 flip that doubles the human population (and wealth and so on) on heads and eliminates them on tails looks like a terrible idea (despite being neutral if your utility function is linear in the number of humans alive). I'm not sure that the logistic has the local behavior that we would want at any particular population size, but something like it probably does.

The solution that a controller would apply to this is typically referring to "upper bound on control effort." That is, the error can be arbitrarily large, but at some point you simply don't have any more ability to adjust the system, and so having 1e18 more people tortured than you want is "just as bad" as having 1e6 more people tortured than you want because both situations are bad enough to employ your maximal effort trying to reduce the number. One thing about this approach is that the bound is determined by your ability to affect the world rather than your capacity to care, but it's not clear to me if that actually makes much of a difference, either mathematically or physically.

Thanks, that makes sense.

On the topic of comparing controllers to utility functions - how does a controller decide what kinds of probabilistic tradeoffs are worth making? For instance, if you have a utility function, it's straightforward to determine whether you prefer, say, a choice that creates X1 new lives with probability P_x1 and kills Y1 people with probability P_y1, versus a choice that creates X2 new new lives with probability P_x2 and kills Y2 people with probability P_y2. How does one model that choice in a control theory framework?

How does one model that choice in a control theory framework?

I see two main challenges. First, we need to somehow encode distributions, and second, we need to look ahead. Both of those are doable, but it's worth mentioning explicitly that the bread and butter of utility maximization (considering probabilistic gambles, and looking ahead to the future) are things that need to be built in the control theory framework, and can be built in a number of different ways. (If we do have a scenario where it's easy to enumerate the choice set, or at least the rules that generate the choice set, and it's also easy to express the preference function, then utility is the right approach to take.)

The closest to the utility framework is likely to wrap the probability distributions over outcomes as the states, and then the 'error' is basically a measure of how much one distribution differs from the distribution we're shooting for. Possible actions are probably fed into a simulator circuit, that spits out the expected distribution. It looks like we could basically express this problem as "minimize opportunity cost while pursuing many options," as if we ever simulate a plan and think it's better than our current best plan we replace the current best plan, but if we simulate a plan and it's not better than our current best plan we look for a new plan to simulate. (You'd also likely bake in some stopping criterion.)

So it would probably look at choice 1, encode the discrete pmf as the reference state, then look at choice 2, and decide whether or not the error is positive (which it responds to by switching to choice 2) or negative (which it responds to by acting on choice 1). But in order to compare pmfs and get a sense of positive or negative I need to have some mathematical function, which would be the utility function in the utility framework.

We also might notice that this makes it easy for endowment effect problems to creep in- if none of the options are obviously better than any of the other options, it would default to whichever one came first. On the flip side, it makes it easy to start working with the first mediocre plan we come across, and then abandon that plan if a better one shows up. That is, this is more suited to operating in continuous time than a "plan, then act" utility maximization framework.

Also, controllers are more robust then utility agents. Utility agents tend to go haywire upon discovering that some term in their utility function isn't actually quite well-defined. Keep in mind that it's impossible to predict future discoveries ahead of time and what their implications for the well-definiteness of terms might be.