Funded by the Advanced Research + Invention Agency (ARIA) through project code MSAI-SE01-P005
This post was written during the Dovetail Research Fellowship. Thanks to Alex and Alfred for help during the writing process and to all the fellows for the fruitful discussions.
I plan to share extended and more formal version of this post as Arxiv preprint, it will be linked here when its finished.
Overview
The Abstract Internal Model Principle, introduced by Wonham in Towards Abstract IMP as a generalization of a tool from control theory, has received some attention from various fields.
@Alfred and @Jose discussed it in context of AI Safety/Agency in posts:
My alternative version of IMP that deals with some of the issues.
I will be using basic notions from dynamical systems theory, everything needed will be defined in the post but for a more complete treatment one might consult Chapter 1.1 of Introduction to Dynamical Systems by Brin and Stuck (or any other textbook).
Explanation
Dynamical systems intro
A Discrete-time Dynamical system consists of a set and a function and typically this function is applied many times represented by representing time moving in discrete steps.
If are dynamical systems, a morphism is a map satisfying . The map is called a semiconjugacy if it is surjective, in this case we say that is a factor of and is an extension of , is also called a factor map. If is bijective then it is isomorphism (conjugacy) of dynamical systems meaning that those systems are essentially the same.
For we denote its orbit by . For function (observation) , we write the observed orbit as .
A set is called invariant in dynamical system if meaning that points from this set won't ever leave it. Dynamical system can be restricted to .\(\)
Examples of dynamical systems
Finite state systems
The simplest examples have finite. For instance, let with and . The orbits are and , which are periodic.
Linear maps
Consider and for a matrix . The orbit of any is . The long-term behaviour depends on the eigenvalues of A: trajectories converge to zero if all eigenvalues have modulus less than 1, diverge if any have modulus greater than 1, and exhibit mixed behaviour otherwise.
Rotations on the circle
Let (the unit circle) and let for some . If is rational, all orbits are periodic. If is irrational, orbits are dense in the circle and equidistributed.
Symbolic dynamics
Given a finite alphabet , the full shift is the dynamical system on all infinite sequences of alphabet symbols, with shifting each sequence left by one position. Subshifts are closed, shift-invariant subsets of , which arise naturally when imposing constraints on allowed symbol sequences. They are examples of restriction of dynamical system.
Symbolic dynamics occur naturally if instead of considering dynamical system one considers system
Abstract IMP
Here is a very high level summary of what IMP is trying to say:
If a controller can keep system in good states, then it must contain model of a system.
A more specific phrasing of what it actually says
If a controller has some structural properties and can keep system in good states, then one can reconstruct dynamics of good part of the system on the controller states.
Now a formal statement, assume that there exist:
A dynamical system representing joint environment-controller states,
A mapping from each joint state to the corresponding controller state ,
A set of desirable environment-controller stateswhich is a subset of ; so we have ,
An invariant subset of these desirable states where the controller will keep the system, i.e .
Additionally, assume that:
The set of desired joint environment-controller states involves the full set of controller states . This can be expressed by writing .
Feedback Structure on : for implies
is detectable relative to meaning (image of orbit via ) is injective when restricted to
Then:
restricted to (denoted ) is injective.
There exists a unique map , such that
. This map is uniquely determined by this implies:
which we can rewrite as
I split 2a and 2b despite them being almost the same to highlight the fact that 1 and 2b shows isomorphism (conjugacy) of dynamical systems while fails to be a dynamical system because is not invariant (one of the examples will expand it).
We can visualize isomorphism between and on a commutative diagram:
Examples
Triviality
Take the dynamical system , i.e. and . With . Then any injective map from , for example and , would satisfy the assumptions and allow for reconstruction of dynamics. Since we used to construct this would mean that a map that renames states of the system contains a model of the system.
Issues with
Second example shows that Feedback on does not imply feedback on , this is what prevents IMP from giving meaningful results for whole .
Let . Values of gamma are shown next to letters.
are in but they fail to satisfy Feedback Structure condition because but . So even if we can say what are values of outside of it is not usable information. Assumption of guarantees that leaving won't cause to much of an issue but it feels artificial in abstract context. We get isomorphism between singletons but if you assume that a system has stable point then there is no ambiguity in dynamics anyway.
Simplified Abstract IMP
As mentioned above, parts of Abstract IMP that refer to set raise some concerns. The interesting part is finding isomorphic systems and that happens only for . In part III of Wonham's Towards Abstract IMP that apply it to linear systems (original setting of IMP) both and have natural definitions that give meaning to their desirability and both Feedback and Detectability conditions are common in control theory. We can restate Abstract IMP with , using two observations:
Feedback on implies feedback on ,
Feedback+Detectability+Invariance on a set imply injectivity of for this set.
First is trivial, second is an exercise (it is discussed in Jose's post and is a part of Abstract IMP proof)
Assume there exist:
A dynamical system representing joint environment-controller states,
A mapping from each joint state to the corresponding controller state
A subset of "desirable" states where the controller will keep the system, i.e , such that restricted to (denoted ) is injective.
Then:
There exists a unique map such that . This map is uniquely determined by
When we assume that is injective instead of stating in indirectly with Feedback and Detectability, the theorem looks far less impressive. This is equivalent to saying that if there is injective function from dynamical system, then one can reproduce the dynamics on its image.
Perhaps instead of thinking about IMP as theorem one could consider it a definition, would then be internal model that controller has of the system . That definition still has the same issues but it is easier to see them.
Idea of improvement
I will change the assumptions to explicitly make the controller a part of dynamics, remove any other assumptions on controller properties and change the goal to instead reconstruct the dynamics of the whole system from the dynamics of the controller. This is similar to the approach from General agents contain world models, but with two differences:
Different setting: Abstract deterministic dynamical systems instead of Controlled Markov processes
Effectiveness of controller: There is no explicit assumption that controller achieves anything. It can be taken into account and I will discuss the specifics after formal statement.
Modified IMP
Abstract summary:
Imagine we are looking at the internals of a controller, we can take a state of the system and ask controller how it would act at this state. From this we want to deduce something about dynamics of the system. Information about dynamics of system that we can get from controller is intuitively the amount of what controller "knows" about dynamics.
Assume there exists:
A dynamical system representing joint environment-controller states is "unknown" we are trying to deduce its properties,
A set of controller states ,
A mapping from each joint state to the corresponding controller state representing current controller state,
A mapping from each joint state to the corresponding controller state representing the controller plan/policy/algorithm.
Controller action is a part of dynamics, i.e. , which means that a controller plan becomes a controller state after evolution of the system.
Then:
Information that can be deduced from the controller i.e. is captured by
if is a well defined function, then controller allows for exact reconstruction of original dynamics, alternatively controller is exact model of original dynamics.
There is nothing to prove here, since all that is done is applying to last assumption. Therefore similarly to Simplified IMP one can think of this as a definition. For dynamical system and controller satisfying assumptions above, the Internal Model of controller is relation .
This can be interpreted as controller "modeling" itself as part of dynamics, which is still a trivially sounding statement.
Effectiveness of controller
To get interesting results one needs a controller that actually achieves something. In Abstract IMP this was keeping system in the set a subset of desirable states. In General agents it was being close to optimal in satisfying goals stated in LTL (for example get from state to state , stay there for 3 next steps and then move back to ). Verifying these conditions requires knowledge of the system dynamics.
If we assume that there is some invariant set subset of , then we can just repeat this approach for restriction of to . All the assumptions still hold after restriction and might become a well defined function when restricted to smaller subsets (or at least be less ambiguous).
Examples
Let's analyse some simple examples to get a better intuition.
Clock example
Take system consisting of two clocks i.e. with action.
This system has two invariant disjoint subsystems,
clock are aligned:
clocks are not aligned:
we can try to consider first clock a controller of whole system both using simplified IMP and modified IMP. Both will mostly tell us the same thing, but will get there in a different way. will be projection to the first coordinate (first clock)
Simplified IMP:
Assumptions are satisfied on either of the invariant subsets, lets say clocks are aligned. Using knowledge of we define which is exactly dynamics of the first clock.
Modified IMP
We assume and which is the same as would be for either or but we do it before choosing. Knowing the behaviour of the first clock restricts possible dynamics of whole system, to choosing between pairs of arrows with same letters in the picture
With that knowledge, additional information about set (or ) being invariant (clocks being aligned or not) is enough to determine dynamics on that set.
Perfect controller
Lets try a bit more complicated setting that might give some intuition for dealing with goals of controller. There will be two controllers in the same setting, that differ in effectiveness
Set:
where first coordinate is position (p) and second is controller state (c)
Dynamics are defined as:
Where is some algorithm of a controller, that uses current state and previous controller state to determine current controller state. (There will be two controllers for this setting)
This means that in each step the controller can change the plant position by at most 1, and use the current state of itself and the plant to plan its next move. Without the controller (meaning the controller is always ) nothing moves. Controller state function is
Restating the assumption for this setting:
Set
dynamics of the system with we will try to deduce it from controller
Controller states ,
representing current controller state,
representing the controller plan/policy/algorithm.
Controller is a part of dynamics, i.e. , this is always satisfied if is as above.
There will be two examples of controller for this system, they were designed to get the system close to state . I mention this to hint at possible future directions of this approach since it should be visible in examples even if it is not worked out yet.
It is easy to design the perfect controller for getting to by taking
(sign of next position) that is:
The following graph shows the behaviour of the system around 0. Arrows represent . Notice that columns contain points with the same and rows contain points with the same .
The system is split into parts, by marked by colours. For red points controller plans , for blue points and for green points . Now consider the case where we additionally know that blue set is invariant. Picture of actual dynamics restricted to this set looks like this:
We can now fully deduce the dynamics of on the invariant subset just from the controller behaviour. All arrows must go to because all points are from the blue set.
For slightly more interesting example lets assume that the invariant set is as on the picture.
Then we can deduce all arrows leaving red points, but arrows leaving blue points could go to either or shown by bold arrows, there are four possibilities of that are compatible with this colouring. Following picture has different arrows (and therefore very different behaviour) but still would result in same colouring. Set considered in pictured above is still invariant here, bold arrows were chosen differently.
Imperfect controller
For a more interesting example, let's modify the controller a bit. It will still be decent at achieving the goal of staying near 0, but it won't be perfect. Intuitively, this controller avoids changing its state unless its necessary.
Again the picture shows the dynamics around and colouring that changed slightly.
Notice that this controller cannot achieve if it doesn't start there. It does, however, do a decent job of getting close to 0, every point except is attracted to 4-state loop .
Once again, while colouring gives some information about the possible dynamics, more can be deduced if we get information about invariant subsets. Knowing that is invariant wouldn't allow for any more deductions. If a singleton is invariant we already know everything.
Note, however, that if we knew that a set is invariant, then a priori there are possible dynamical systems on L. Taking into account information from the controller, we get a diagram of possibilities
All possible dynamics can be obtained by choosing one of the two arrows coming out of each node. If we were to additionally know that is a permutation on , then there would be only 4 possible choices of compatible dynamics from total.
See below for few examples of such choices.
Effectiveness
Both perfect and imperfect controllers are pretty good at directing all the points from the system to be near 0. The perfect controller is uniquely defined as getting all points to state as fast as possible. Intuitively this gives even more information about the system, for example: "A positive controller state means that p will increase and negative means that it will decrease". Formalizing such intuitions to work in general setting and not for specific toy examples is a crucial continuation of this work.
Introduce a causal link between World and World Model World Model is now deduced from behaviour of controller in the World
Make it harder to Satisfy the Conditions Controller now needs to be a part of the the system. Conditions are still easy to satisfy but I hope that triviality issues are at least clearer with this approach
Show that the controller must model the whole world, not just the 'good' states Controller contains some information about whole world, restricting to 'good' states potentially gives more information
Introduce a notion of Optimization
Introduce Randomness
Explicitly include Controller 'Actions' There are no explicit Controller Actions but Controller is embedded in a system and if it doesn't affect the rest of the world (it always has the same state), then it doesn't convey new information.
Continuation
This is still a very preliminary work,
Optimization/Effectiveness: It was somewhat present in the examples I gave, and it was a main motivation for what was presented here. Ideal theorem would have be something similar to General Agents but
Different categories of dynamical systems:
Random not only Controlled Markov process
Continuous time
Additional structure of the system This would allow for more specific results, especially with Optimization/defining goals in mind
Metric or Topology (appeared informally with being near 0)
Measure
Linearity
etc.
Imperfect Observations: Currently both points in and controller states are observed perfectly. It can be generalized to seeing space and controller states with only limited precision or some random noise. A trivial example is when only controller states of points in are known, we can get a factor of , corresponding to controller states.
Funded by the Advanced Research + Invention Agency (ARIA) through project code MSAI-SE01-P005
This post was written during the Dovetail Research Fellowship. Thanks to Alex and Alfred for help during the writing process and to all the fellows for the fruitful discussions.
I plan to share extended and more formal version of this post as Arxiv preprint, it will be linked here when its finished.
Overview
The Abstract Internal Model Principle, introduced by Wonham in Towards Abstract IMP as a generalization of a tool from control theory, has received some attention from various fields.
@Alfred and @Jose discussed it in context of AI Safety/Agency in posts:
Last of those posts and Imran Thobani's paper A triviality worry for the internal model principle raise issues with applying IMP to agents. The paper General agents contain world models was an inspiration for this work. End goal would be to obtain similar results a wide class of systems.
This post has two main parts:
I will be using basic notions from dynamical systems theory, everything needed will be defined in the post but for a more complete treatment one might consult Chapter 1.1 of Introduction to Dynamical Systems by Brin and Stuck (or any other textbook).
Explanation
Dynamical systems intro
A Discrete-time Dynamical system consists of a set and a function and typically this function is applied many times represented by representing time moving in discrete steps.
If are dynamical systems, a morphism is a map satisfying . The map is called a semiconjugacy if it is surjective, in this case we say that is a factor of and is an extension of , is also called a factor map. If is bijective then it is isomorphism (conjugacy) of dynamical systems meaning that those systems are essentially the same.
For we denote its orbit by . For function (observation) , we write the observed orbit as .
A set is called invariant in dynamical system if meaning that points from this set won't ever leave it. Dynamical system can be restricted to .\(\)
Examples of dynamical systems
Finite state systems
The simplest examples have finite. For instance, let with and . The orbits are and , which are periodic.
Linear maps
Consider and for a matrix . The orbit of any is . The long-term behaviour depends on the eigenvalues of A: trajectories converge to zero if all eigenvalues have modulus less than 1, diverge if any have modulus greater than 1, and exhibit mixed behaviour otherwise.
Rotations on the circle
Let (the unit circle) and let for some . If is rational, all orbits are periodic. If is irrational, orbits are dense in the circle and equidistributed.
Symbolic dynamics
Given a finite alphabet , the full shift is the dynamical system on all infinite sequences of alphabet symbols, with shifting each sequence left by one position. Subshifts are closed, shift-invariant subsets of , which arise naturally when imposing constraints on allowed symbol sequences. They are examples of restriction of dynamical system.
Symbolic dynamics occur naturally if instead of considering dynamical system one considers system
Abstract IMP
Here is a very high level summary of what IMP is trying to say:
If a controller can keep system in good states, then it must contain model of a system.
A more specific phrasing of what it actually says
If a controller has some structural properties and can keep system in good states, then one can reconstruct dynamics of good part of the system on the controller states.
Now a formal statement, assume that there exist:
Additionally, assume that:
Then:
I split 2a and 2b despite them being almost the same to highlight the fact that 1 and 2b shows isomorphism (conjugacy) of dynamical systems while fails to be a dynamical system because is not invariant (one of the examples will expand it).
We can visualize isomorphism between and on a commutative diagram:
Examples
Triviality
Take the dynamical system , i.e. and . With . Then any injective map from , for example and , would satisfy the assumptions and allow for reconstruction of dynamics. Since we used to construct this would mean that a map that renames states of the system contains a model of the system.
Issues with
Second example shows that Feedback on does not imply feedback on , this is what prevents IMP from giving meaningful results for whole .
Let . Values of gamma are shown next to letters.
Simplified Abstract IMP
As mentioned above, parts of Abstract IMP that refer to set raise some concerns. The interesting part is finding isomorphic systems and that happens only for . In part III of Wonham's Towards Abstract IMP that apply it to linear systems (original setting of IMP) both and have natural definitions that give meaning to their desirability and both Feedback and Detectability conditions are common in control theory. We can restate Abstract IMP with , using two observations:
First is trivial, second is an exercise (it is discussed in Jose's post and is a part of Abstract IMP proof)
Assume there exist:
Then:
When we assume that is injective instead of stating in indirectly with Feedback and Detectability, the theorem looks far less impressive. This is equivalent to saying that if there is injective function from dynamical system, then one can reproduce the dynamics on its image.
Perhaps instead of thinking about IMP as theorem one could consider it a definition, would then be internal model that controller has of the system . That definition still has the same issues but it is easier to see them.
Idea of improvement
I will change the assumptions to explicitly make the controller a part of dynamics, remove any other assumptions on controller properties and change the goal to instead reconstruct the dynamics of the whole system from the dynamics of the controller. This is similar to the approach from General agents contain world models, but with two differences:
Modified IMP
Abstract summary:
Imagine we are looking at the internals of a controller, we can take a state of the system and ask controller how it would act at this state. From this we want to deduce something about dynamics of the system. Information about dynamics of system that we can get from controller is intuitively the amount of what controller "knows" about dynamics.
Assume there exists:
Then:
There is nothing to prove here, since all that is done is applying to last assumption. Therefore similarly to Simplified IMP one can think of this as a definition. For dynamical system and controller satisfying assumptions above, the Internal Model of controller is relation .
This can be interpreted as controller "modeling" itself as part of dynamics, which is still a trivially sounding statement.
Effectiveness of controller
To get interesting results one needs a controller that actually achieves something. In Abstract IMP this was keeping system in the set a subset of desirable states. In General agents it was being close to optimal in satisfying goals stated in LTL (for example get from state to state , stay there for 3 next steps and then move back to ). Verifying these conditions requires knowledge of the system dynamics.
If we assume that there is some invariant set subset of , then we can just repeat this approach for restriction of to . All the assumptions still hold after restriction and might become a well defined function when restricted to smaller subsets (or at least be less ambiguous).
Examples
Let's analyse some simple examples to get a better intuition.
Clock example
Take system consisting of two clocks i.e. with action .
This system has two invariant disjoint subsystems,
we can try to consider first clock a controller of whole system both using simplified IMP and modified IMP. Both will mostly tell us the same thing, but will get there in a different way. will be projection to the first coordinate (first clock)
Simplified IMP:
Assumptions are satisfied on either of the invariant subsets, lets say clocks are aligned. Using knowledge of we define which is exactly dynamics of the first clock.
Modified IMP
We assume and which is the same as would be for either or but we do it before choosing. Knowing the behaviour of the first clock restricts possible dynamics of whole system, to choosing between pairs of arrows with same letters in the picture
With that knowledge, additional information about set (or ) being invariant (clocks being aligned or not) is enough to determine dynamics on that set.
Perfect controller
Lets try a bit more complicated setting that might give some intuition for dealing with goals of controller. There will be two controllers in the same setting, that differ in effectiveness
Set:
This means that in each step the controller can change the plant position by at most 1, and use the current state of itself and the plant to plan its next move. Without the controller (meaning the controller is always ) nothing moves. Controller state function is
Restating the assumption for this setting:
There will be two examples of controller for this system, they were designed to get the system close to state . I mention this to hint at possible future directions of this approach since it should be visible in examples even if it is not worked out yet.
It is easy to design the perfect controller for getting to by taking
(sign of next position) that is:
The following graph shows the behaviour of the system around 0. Arrows represent . Notice that columns contain points with the same and rows contain points with the same .
The system is split into parts, by marked by colours. For red points controller plans , for blue points and for green points . Now consider the case where we additionally know that blue set is invariant. Picture of actual dynamics restricted to this set looks like this:
We can now fully deduce the dynamics of on the invariant subset just from the controller behaviour. All arrows must go to because all points are from the blue set.
For slightly more interesting example lets assume that the invariant set is as on the picture.
Then we can deduce all arrows leaving red points, but arrows leaving blue points could go to either or shown by bold arrows, there are four possibilities of that are compatible with this colouring. Following picture has different arrows (and therefore very different behaviour) but still would result in same colouring. Set considered in pictured above is still invariant here, bold arrows were chosen differently.
Imperfect controller
For a more interesting example, let's modify the controller a bit. It will still be decent at achieving the goal of staying near 0, but it won't be perfect. Intuitively, this controller avoids changing its state unless its necessary.
Again the picture shows the dynamics around and colouring that changed slightly.
Notice that this controller cannot achieve if it doesn't start there. It does, however, do a decent job of getting close to 0, every point except is attracted to 4-state loop .
Once again, while colouring gives some information about the possible dynamics, more can be deduced if we get information about invariant subsets. Knowing that is invariant wouldn't allow for any more deductions. If a singleton is invariant we already know everything.
Note, however, that if we knew that a set is invariant, then a priori there are possible dynamical systems on L. Taking into account information from the controller, we get a diagram of possibilities
All possible dynamics can be obtained by choosing one of the two arrows coming out of each node. If we were to additionally know that is a permutation on , then there would be only 4 possible choices of compatible dynamics from total.
See below for few examples of such choices.
Effectiveness
Both perfect and imperfect controllers are pretty good at directing all the points from the system to be near 0. The perfect controller is uniquely defined as getting all points to state as fast as possible. Intuitively this gives even more information about the system, for example: "A positive controller state means that p will increase and negative means that it will decrease". Formalizing such intuitions to work in general setting and not for specific toy examples is a crucial continuation of this work.
Conclusion
Improvements
Alfred's listed Seven ways to Improve the Internal Model Principle, all except 5 and 6 are at least partially addressed:
Removed it
World Model is now deduced from behaviour of controller in the World
Controller now needs to be a part of the the system. Conditions are still easy to satisfy but I hope that triviality issues are at least clearer with this approach
Controller contains some information about whole world, restricting to 'good' states potentially gives more information
There are no explicit Controller Actions but Controller is embedded in a system and if it doesn't affect the rest of the world (it always has the same state), then it doesn't convey new information.
Continuation
This is still a very preliminary work,