Sep 19, 2016
I had a really interesting conversation with a guy about modelling information. what I did when talking to him is in one case insist that his model be made more simple because adding more variation in the model was unhelpful, and then in another case in the same conversation, insist his model be made more complicated to account for the available information that didn't fit his model.
On reflection I realised that I had applied two opposing forces to models of information and could only vaguely explain why. With that in mind I decided to work out what was going on. The following is obvious, but that's why I am writing it out, so that no one else has to do the obvious thing.
This all comes down to what you are measuring or describing. If you are trying to describe something rather general, like "what impact do number of beach-goers have on the pollution at the beach?", it's probably not important what gender, age, race, time spent at the beach or socioeconomic status the beach goers are. (With the exception of maybe socioeconomic status of the surrounding geopolitical territory), what is important is maybe two pieces of information:
That's it. This would be a case for reducing the survey of beach goers down to a counter of beach goers and a daily photo of the remaining state of the beach at the end of the day (which could be compared to other similar photos). Or even just - 3 photos, one at 9am (start), one at 1pm (peak) and one at 5pm (end). This model needs no more moving parts. The day you want to start using historic information to decide how many beach cleaners you want to employ, you can do that from the limited but effective data you have gathered.
Let's continue the same example. You have 3 photos of each day, but sometimes the 1pm photo is deserted. Nearly no one is at the beach, and you wonder why. It's also messing with your predictions because there is still a bit of rubbish at 5pm even though very few people were at the beach. The model no longer explains the state of the world. The map is wrong. But that's okay. We can fix it by adding more information. You notice that most days the model is good, so there might be something going on for the other days which needs a + k factor to the equation (+k is something added in chemistry, in algebra it's sometimes called a +c as in y=mx+b+c, and physics +x, but generally adding a variable to an equation is common to all science fields). Some new variable.
Let's say that being omniscient to our own made up examples we know that the cause is the weather. On stormy windy rainy days - no one goes to the beach, but some rubbish washes up. Does this match the data? almost perfectly. Does this help explain the map? Yes. Is it necessary? That depends on what you are doing with the information. Maybe it's significant enough in this scenario that it is necessary.
The example that came up in conversation was his own internal model that there is fundamentally something different between someone who does exercise, and someone who Doesn't exercise. I challenged this model for having too much complexity. I argue that the model of - there is a hidden and secret moving part between does/doesn't exercise, is a model that doesn't describe the world better than a model without that moving part.
The model does something else (and found its way into existence for this reason). If you find yourself on one side of the model (i.e. the "I don't exercise") then you can protect yourself from attributing the failure to exercise to your own inability to do it by declaring that there is a hidden and secret moving part that prevents me from being in the other observable group. This preserves your non-changing and let's you get away with it for a longer time. I know this model because that is what I did. I held this model very strongly. And then I went out and searched for the hidden and secret moving part that I could change in order to move myself into the other group. There was no hidden and secret moving part. Or if there was I couldn't find it. However, I did manage to stop holding the model that there was some hidden and secret moving part, and instead just start exercising more.
In figuring out if this model is real or a made up model to protect your own brain from being critical of itself, start to think of what the world would look like if it were true. If there was some difference between people who do exercise and people who do not - we might see people clustered in observable groups and never be able to change between them (This is not true because we regularly see people publishing their weight loss journeys, we also regularly see people getting fatter and unhealthier, suggesting that travel in either direction is entirely possible and happens all the time). If there were something describable it would be as obvious as different species, in fact - thinking evolutionarily - if such a thing existed, it's likely that it would have significantly shaped the state of the world already to be completely different... Given that we can't know for sure, this might not be a very strong argument.
If you got this far - as I did and wondered, so why can't I be in the other group - I have news for you. You can.
Meta: this took an hour to write. If I were to spend more time on it, it would probably be to tighten up the examples and maybe provide more examples. I am not sure that such time would be useful to you and am interested in if you think it will be useful.