Thanks for your comment. I am glad the post helped you!
Good questions. The short answer is that you are correct and I was sloppy in that section.
Could the evolution of the joint states form a cycle?
Yes! In fact, if S and W are finite sets then the evolution must eventually form a cycle. (If a finite set S has cardinality n, you can only apply a function at most n times before you return to a state you have visited before). I meant this to be implicit in the '... etc.' part but I didn't make it clear. I have added the following sentence to the post which hopefully clarifies things:
If the sets and are finite with cardinality then the evolution would eventually cycle around so we would have to specify that the evolution will eventually come full circle ie. and .
Could multiple joint states evolve to the same joint state?
This is a good question involving a subtlety that I skipped over. The answer is 'yes sometimes'. But when it does happen its a little weird and worth thinking about. There are a few ways ways in which multiple joint states could evolve to the same joint state.
1)Two different environment states with the same controller state evolving to the same environment state. eg. and
In this case, the Detectability condition is violated, since the controller will do the same thing, regardless of whether the environment is in and . The Detectability condition would tell us that this means that s_1 and s_3 are (from the point of view of the controller) identical, so we should coarse grain them so that they are both labelled the same. This means that we wouldn't expect this kind of joint evolution.
2)Two different controller states with the same environment state lead evolve to the same joint state. eg and
In this case, Detectability is satisfied. As far as I can tell, this kind of evolution does not violate any of the conditions for the IMP so it is valid. However, notice what this would imply. There are two controller states (w_1 and w_3) which both do the same thing to the system (cause it to evolve to s_2). After either of these states, the controller then evolves to w_2 and from then on behaves identically forever. It seems to me that for our purposes w_1 and w_3 are 'the same' controller state so I would be inclined to coarse grain them and label them as the same, removing this kind of evolution. However, since there are no assumptions which explicitly require this kind of coarse graining over the controller states, this kind of evolution is technically allowed within the IMP.
3)Two joint states with different environment and controller states evolve to the same joint state. eg. and .
Again, I think that this is allowed within the IMP. But notice that after the evolution both trajectories will behave the same. This means that if the environment and controller state sets are finite then at most one of these joint states will be involved in any kind of repeating cycle. The other will be 'transient' ie. it will occur once and never again.
I think that reading the whole soliloquy makes my reading clearer and your reading less plausible. I can maybe see that if you:
Then maybe you would come to the conclusion that Juliet has an objection to specifically Romeo's first name and not the fact that his name more generally links him to his family. But if you don't ignore those things, it seems clear to me that Juliet is lamenting the fact that the man she loves has a name ('Romeo Montague') which links him to a family who she is not allowed to love.
I strongly agree with the general sentiment of ‘don’t be afraid to say something you think is true, even if you are worried it might seem stupid’. Having said that:
I don't agree with your analysis of the line. She’s not upset that he’s named Romeo. She is asking “Why does Romeo (the man I have fallen in love with) have to be the same person as Romeo (the son of Lord Montague with whom my family has a feud)?”. The next line is ‘Deny thy father and refuse thy name’ which I think makes this interpretation pretty clear (ie. if only you told me you were not Romeo, the son of Lord Montague, then things would be ok). The line seems like a perfectly fine (albeit poetic and archaic) way to express this.
This works with your modern translation ("Romeo, why you gotta be Romeo?"). Imagine an actor delivering that line and emphasising the ‘you’ (‘Romeo, why do you have to be Romeo?’) and I think it makes sense. Given the context and delivery, it feels clear that it should be interpreted as 'Romeo (man I've just met) why do you have to be Romeo (Montague)?'. It seems unfair to declare that the line taken out of context doesn’t make sense just because she doesn’t explicitly mention that her issue is with his family name. Especially when the very next line (and indeed the whole rest of the play) clarifies that the issue is with his family.
Sure, the line is poetic and archaic and relies on context, which makes it less clear. But these things are to be expected reading Shakespeare!
It is also fairly common for directors/writers to use a book as a inspiration but not care about the specific details because they want to express their own artistic vision. Hitchcock refused to adapt books that he considered 'masterpieces', since he saw no point in trying to improve them. When he adapted books (such as Daphne du Maurier’s The Birds) he used the source material as loose inspiration and made the films his own.
François Truffaut: Your own works include a great many adaptations, but mostly they are popular or light entertainment novels, which are so freely refashioned in your own manner that they ultimately become a Hitchcock creation. Many of your admirers would like to see you undertake the screen version of such a major classic as Dostoyevsky’s Crime and Punishment, for instance.
Alfred Hitchcock: Well, I shall never do that, precisely because Crime and Punishment is somebody else’s achievement. There’s been a lot of talk about the way in which Hollywood directors distort literary masterpieces. I’ll have no part of that! What I do is to read a story only once, and if I like the basic idea, I just forget all about the book and start to create cinema. Today I would be unable to tell you the story of Daphne du Maurier’s The Birds. I read it only once, and very quickly at that. An author takes three or four years to write a fine novel; it’s his whole life. Then other people take it over completely. Craftsmen and technicians fiddle around with it and eventually someone winds up as a candidate for an Oscar, while the author is entirely forgotten. I simply can’t see that.
FT: I take it then that you’ll never do a screen version of Crime and Punishment.
AH: Even if I did, it probably wouldn’t be any good.
FT: Why not?
AH: Well, in Dostoyevsky’s novel there are many, many words and all of them have a function.
FT: That’s right. Theoretically, a masterpiece is something that has already found its perfection of form, its definitive form.
AH: Exactly, and to really convey that in cinematic terms, substituting the language of the camera for the written word, one would have to make a six- to ten-hour film. Otherwise, it won’t be any good.
(From Hitchcock/Truffaut, quoted here).
Alfonso Cuaron also liked the idea of Children of Men (the book) but disliked almost all the specific details, so he used his film as a chance to make all of the changes he wanted to see.
In the post 'Can economics change your mind?' he has a list of examples where he has changed his mind due to evidence:
1. Before 1982-1984, and the Swiss experience, I thought fixed money growth rules were a good idea. One problem (not the only problem) is that the implied interest rate volatility is too high, or exchange rate volatility in the Swiss case.
2. Before witnessing China vs. Eastern Europe, I thought more rapid privatizations were almost always better. The correct answer depends on circumstance, and we are due to learn yet more about this as China attempts to reform its SOEs over the next five to ten years. I don’t consider this settled in the other direction either.
3. The elasticity of investment with respect to real interest rates turns out to be fairly low in most situations and across most typical parameter values.
4. In the 1990s, I thought information technology would be a definitely liberating, democratizing, and pro-liberty force. It seemed that more competition for resources, across borders, would improve economic policy around the entire world. Now this is far from clear.
5. Given the greater ease of converting labor income into capital income, I no longer am so convinced that a zero rate of taxation on capital income is best.
6. The social marginal value of health care is often quite low, much lower than I used to realize. By the way, hardly anyone takes this on consistently to guide their policy views, no matter how evidence-driven they may claim to be.
7. Mormonism, and other relatively strict religions, can have big anti-poverty effects. I wouldn’t say I ever believed the contrary, but for a long time I simply didn’t give the question much attention. I now think that Mormonism has a better anti-poverty agenda than does the Progressive Left.
8. There are positive excess returns to some momentum investment strategies.
I don't know enough about economics to tell how much these meet your criteria for 'I was wrong' rather than 'revised estimates' or something else (he doesn't use the exact phrase 'I was wrong') but it seems in the spirit of what you are looking for.
I know you asked for other people (presumably not me) to confirm this but I can point you to the statement of the theorem, as written by Conant and Ashby in the original paper :
Theorem: The simplest optimal regulator R of a reguland S produces events R which are related to the events S by a mapping
Restated somewhat less rigorously, the theorem says that the best regulator of a system is one which is a model of that system in the sense that the regulator’s actions are merely the system’s actions as seen through a mapping h.
I agree that it has nothing to do with modelling and is not very interesting! But the simple theorem is surrounded by so much mysticism (both in the paper and in discussions about it) that it is often not obvious what the theorem actually says.
The diagram is a causal Bayes net which is a DAG so it can't contain cycles. Your diagram contains a cycle between R and Z. The diagram I had in mind when writing the post was something like:
which is a thermostat over a single timestep.
If you wanted to have a feedback loop over multiple timesteps, you could conjoin several of these diagrams:
Each node along the top row is the temperature at successive times. Each node along the bottom row is the controller state at different times.
Thanks for the clarifications, that all makes sense. I will keep thinking about this!
This seems like an interesting problem! I've been thinking about it a little bit but wanted to make sure I understood before diving in too deep. Can I see if I understand this by going through the biased coin example?
Suppose I have 2^5 coins and each one is given a unique 5-bit string label covering all binary strings from 00000 to 11111. Call the string on the label .
The label given to the coin indicates its 'true' bias. The string 00000 indicates that the coin with that label has p(heads)=0. The coin labelled 11111 has p(heads)=1. The ‘true’ p(heads) increases in equal steps going up from 00000 to 00001 to 00010 etc. Suppose I randomly pick a coin from this collection, toss it 200 times and call the number of heads X_1. Then I toss it another 200 times and call the number of heads X_2.
Now, if I tell you what the label on the coin was (which tells us the true bias of the coin), telling you X_1 would not give you any more information to help you guess X_2 (and vice versa). This is the first Natural Latent condition ( induces independence between X_1 and X_2). Alternatively, if I didn’t tell you the label, you could estimate it from either X_1 or X_2 equally well. This is the other two diagrams.
I think that the full label will be an approximate stochastic natural latent. But if we consider only the first bit[1] of the label (which roughly tells us whether the bias is above or below 50% heads) then this bit will be a deterministic natural latent because with reasonably high certainty, you can guess the first bit of from X_1 or X_2. This is because the conditional entropy H(first bit of |X_1) is low. On the other hand H( | X_1) will be high. If I get only 23 heads out of 200 tosses, I can be reasonably certain that the first bit of is a 0 (ie the coin has a less than 50% of coming up heads) but can't be as certain what the last bit of is. Just because satisfies the Natural Latent conditions within , this doesn’t imply that satisfies . We can use X_1 to find a 5-bit estimate of , but most of the useful information in that estimate is contained in the first bit. The second bit might be somewhat useful, but its less certain than the first. The last bit of the estimate will largely be noise. This means that going from using to using ‘first bit of ’ doesn’t decrease the usefulness of the latent very much, since the stuff we are throwing out is largely random. As a result, the ‘first bit of ’ will still satisfy the natural latent conditions almost as well as . By throwing out the later bits, we threw away the most 'stochastic' bits, while keeping the most 'latenty' bits.
So in this case, we have started from a stochastic natural latent and used it to construct a deterministic natural latent which is almost as good. I haven’t done the calculation, but hopefully we could say something like ‘if satisfies the natural latent conditions within then the first bit of satisfies the natural latent conditions within (or or something else)’. Would an explicit proof of a statement like this for this case be a special case of the general problem?
The problem question could be framed as something like: “Is there some standard process we can do for every stochastic natural latent, in order to obtain a deterministic natural latent which is almost as good (in terms of \epsilon)”. This process will be analogous to the ‘throwing away the less useful/more random bits of \lambda’ which we did in the example above. Does this sound right?
Also, can all stochastic natural latents can be thought of as 'more approximate' deterministic latents? If a latent satisfies the the three natural latents conditions within , we can always find a (potentially much bigger) such that this latent also satisfies the deterministic latent condition, right? This is why you need to specify that the problem is showing that a deterministic natural latent exists with 'almost the same' . Does this sound right?
I'm going to talk about the 'first bit' but an equivalent argument might also hold for the 'first two bits' or something. I haven't actually checked the maths.
That sounds about right. The extra thing that they are claiming is that these assumptions are things that naturally apply in real life, when a controller is doing its job (ie. they are not just contrived/chosen to get the result). So (Wonham et al claim) the interesting thing is that you can say that these isomorphisms hold in actual systems. Obviously there are a bunch of issues with this. I intentionally avoided too much discussion and criticism in this post and put it in a separate post.