If it has an integral gain, it will notice this and try to add more and more heat until it stops being wrong. If it can't, it's going to keep asking for more and more output, and keep expecting that this time it'll get there. And because it lacks the control authority to do it, it will keep being wrong, and maybe damage its heating element by asking more than they can safely do. Sound familiar yet?
From tone and context, I am guessing that you intend for this to sound like motivated reasoning, even though it doesn't particularly remind me of motivated reasoning. (I am annoyed that you are forcing me to guess what your intended point is.)
I think the key characteristic of motivated reasoning is that you ignore some knowledge or model that you would ordinarily employ while under less pressure. If you stay up late playing Civ because you simply never had a model saying that you need a certain amount of sleep in order to feel rested, then that's not motivated reasoning, it's just ignorance. It only counts as motivated reasoning if you, yourself would ordinarily reason that you need a certain amount of sleep in order to feel rested, but you are temporarily suspending that ordinary reasoning because you dislike its current consequences.
(And I think this is how most people use the term.)
So, imagine a scenario where you need 100J to reach your desired temp but your heating element can only safely output 50J.
If you were to choose to intentionally output only 50J, while predicting that this would somehow reach the desired temperature (contrary to the model you regularly employ in more tractable situations), then I would consider that a central example of motivated reasoning. But your model does not seem to me to explain how this strategy arises.
Rather, you seem to be describing a reaction where you try to output 100J, meaning you are choosing an action that is actually powerful enough to accomplish your goal, but which will have undesirable side-effects. This strikes me as a different failure mode, which I might describe as "tunnel vision" or "obsession".
I suppose if your heating element is in fact incapable of outputting 100J (even if you allow side-effects), and you are aware of this limitation, and you choose to ask for 100J anyway, while expecting this to somehow generate 100J (directly contra the knowledge we just assumed you have), then that would count as motivated reasoning. But I don't think your analogy is capable of representing a scenario like this, because you are inferring the controller's "expectations" purely from its actions, and this type of inference doesn't allow you to distinguish between "the controller is unaware that its heating element can't output 100J" from "the controller is aware, but choosing to pretend otherwise". (At least, not without greatly complicating the example and considering controllers with incoherent strategies.)
Meta-level feedback: I feel like your very long comment has wasted a lot of my time in order to show off your mastery of your own field in ways that weren't important to the conversation; e.g. the stuff about needing to react faster than the thermometer never went anywhere that I could see, and I think your 5-paragraph clarification that you are interpreting the controller's actions as implied predictions could have been condensed to about 3 sentences. If your comments continue to give me similar feelings, then I will stop reading them.
At some point, a temperature control system needs to take actions to control the temperature. Choosing the correct action depends on responding to what the temperature actually is, not what you want it to be, or what you expect it to be after you take the (not-yet-determined) correct action.
If you are picking your action based on predictions, you need to make conditional predictions based on different actions you might take, so that you can pick the action whose conditional prediction is closer to the target. And this means your conditional predictions can't all be "it will be the target temperature", because that wouldn't let you differentiate good actions from bad actions.
It is possible to build an effective temperature control system that doesn't involve predictions at all; you can precompute a strategy (like "turn heater on below X temp, turn it off above Y temp") and program the control system to execute that strategy without it understanding how the strategy was generated, and in that case it might not have models or make predictions at all. But if you were going to rely on predictions to pick the correct action, it would be necessary to make some (conditional) predictions that are not simply "I will succeed".
Your explanation about the short-term planner optimizing against the long-term planner seems to suggest we should only see motivated reasoning in cases where there is a short-term reward for it.
It seems to me that motivated reasoning also occurs in cases like gamblers thinking their next lottery ticket has positive expected value, or competitors overestimating their chances of winning a competition, where there doesn't appear to be a short-term benefit (unless the belief itself somehow counts as a benefit). Do you posit a different mechanism for these cases?
I've been thinking for a while that motivated reasoning sort of rhymes with reward hacking, and might arise any time you have a generator-part Goodharting an evaluator-part. Your short-term and long-term planners might be considered one example of this pattern?
I've also wondered if children covering their eyes when they get scared might be an example of the same sort of reward hacking (instead of eliminating the danger, they just eliminate the warning signal from the danger-detecting part of themselves by denying it input).
... except that you have a natural immunity (well, aversion) to adopting complex generators, and a natural affinity for simple explanations. Or at least I think both of those are true of most people.
It seems pretty important to me to distinguish between "heuristic X is worse than its inverse" and "heuristic X is better than its inverse, but less good than you think it is".
Your top-level comment seemed to me like it was saying that a given simple explanation is less likely to be true than a given complex explanation. Here, you seem to me like you're saying that simple explanation is more likely to be true, but people have a preference for them that is stronger than the actual effect, and so you want to push people back to having a preference that is weaker but still in the original direction.
"Possible" is a subtle word that means different things in different contexts. For example, if I say "it is possible that Angelica attended the concert last Saturday," that (probably) means possible relative to my own knowledge, and is not intended to be a claim about whether or not you possess knowledge that would rule it out.
If someone says "I can(not) imagine it, therefore it's (not) possible", I think that is valid IF they mean "possible relative to my understanding", i.e. "I can(not) think of an obstacle that I don't see any way to overcome".
(Note that "I cannot think of a way of doing it that I believe would work" is a weaker claim, and should not be regarded as proof that the thing is impossible even just relative to your own knowledge.)
If that is what they mean, then I think the way to move forward is for the person who imagines it impossible to point out an obstacle that seems insurmountable to them, and then the person who imagines it possible to explain how they imagine solving it, and repeat.
If someone is trying to claim that their (in)ability to imagine something means that the laws of the universe (dis)allow it, then I think the person imagining it is impossible had better be able to point out a specific conflict between the proposal and known law, and the person imagining it is possible had better be able to draw a blueprint describing the thing's composition and write down the equations governing its function. Otherwise I call bullshit. (Yes, I'm aware I am calling bullshit on a number of philosophers, here.)
I interpreted the name as meaning "performed free association until the faculty of free association was exhausted". It is, of course, very important that exhausting the faculty does not guarantee that you have exhausted the possibility space.
Alas, unlike in cryptography, it's rarely possible to come up with "clean attacks" that clearly show that a philosophical idea is wrong or broken.
I think the state of philosophy is much worse than that. On my model, most philosophers don't even know what "clean attacks" are, and will not be impressed if you show them one.
Example: Once in a philosophy class I took in college, we learned about a philosophical argument that there are no abstract ideas. We read an essay where it was claimed that if you try to imagine an abstract idea (say, the concept of a dog), and then pay close attention to what you are imagining, you will find you are actually imagining some particular example of a dog, not an abstraction. The essay went on to say that people can have "general" ideas where that example stands for a group of related objects rather than just for a single dog that exactly matches it, but that true "abstract" ideas don't exist.[1]
After we learned about this, I approached the professor and said: This doesn't work for the idea of abstract ideas. If you apply the same explanation, it would say: "Aha, you think you're thinking of abstract ideas in the abstract, but you're not! You're actually thinking of some particular example of an abstract idea!" But if I'm thinking of a particular example, then there must be at least one example to think of, right? So that would prove there is at least one member of the class of abstract ideas (whatever "abstract ideas" means to me, inside my own head). Conversely, if I'm not thinking of an example, then the paper's proposed explanation is wrong for the idea of abstract ideas itself. So either way, there must be at least one idea that isn't correctly explained by the paper.
The professor did not care about this argument. He shrugged and brushed it off. He did not express agreement, he did not express a reason for disagreement, he was not interested in discussing it, and he did not encourage me to continue thinking about the class material.
On my model, the STEM fields usually have faith in their own ideas, in a way where they actually believe those ideas are entangled with the Great Web. They expect ideas to have logical implications, and expect the implications of true ideas to be true. They expect to be able to build machines in real life and have those machines actually work. It's something like taking ideas seriously, and something like taking logic seriously, and taking the concept of truth seriously, and seriously believing that we can learn truth if we work hard. I'm not sure if I've named it correctly, but I do think there's a certain mental motion of genuine truth-seeking that is critical to the health of these fields and that is much less common in many other fields.
Also on my model, the field of philosophy has even less of this kind of faith than most fields. Many philosophers think they have it, but actually they mostly have the kind of faith where your subconscious mind chooses to make your conscious mind believe a thing for non-epistemic reasons (like it being high-status, or convenient for you). And thus, much of philosophy (though not quite all of it) is more like culture war than truth-seeking (both among amateurs and among academics).
I think if I had made an analogous argument in any of my STEM classes, the professor would have at least taken it seriously. If they didn't believe the conclusion but also couldn't point out a specific invalid step, that would have bothered them.
I suspect my philosophy professor tagged my argument as being from the genre of math, rather than the genre of philosophy, then concluded he would not lose status for ignoring it.
I think this paper was clumsily pointing to a true and useful insight about how human minds naturally tend to use categories, which is that those categories are, by default, more like fuzzy bubbles around central examples than they are like formal definitions. I suspect the author then over-focused on visual imagination, checked a couple of examples, and extrapolated irresponsibly to arrive at a conclusion that I hope is obviously-false to most people with STEM backgrounds.
An awful lot of people, probably a majority of the population, sure do feel deep yearning to either inflict or receive pain, to take total control over another or give total control to another, to take or be taken by force, to abandon propriety and just be a total slut, to give or receive humiliation, etc.
This is rather tangential to the main thrust of the post, but a couple of people used a react to request a citation for this claim.
One noteworthy source is Aella's surveys on fetish popularity and tabooness. Here is an older one that gives the % of people reporting interest, and here is a newer one showing the average amount of reported interest on a scale from 0 (none) to 5 (extreme), both with tens of thousands of respondents.
Very approximate numbers that I'm informally reading off the graphs:
Note that a 3/5 average interest could mean either that 60% of people are extremely into it or that nearly everyone is moderately into it (or anything in between). Which seems to imply the survey used in the more recent graph has significantly kinkier answers overall, unless I'm misunderstanding something. (I'm fairly certain that people with zero interest ARE being included in the average, because several other fetishes have average interest below 1, which should be impossible if not.)
If we believe this data, it seems pretty safe to guess that a majority of people are into at least one of these things (unless there is near-total overlap between them). The claim that a majority "feel a deep yearning" is not strongly supported but seems plausible.
(I was previously aware that BDSM interest was pretty common for an extremely silly reason: I saw some people arguing about whether or not Eliezer Yudkowsky was secretly the author of The Erogamer, one of them cited the presence of BDSM in the story as evidence in favor, and I wanted to know the base rate to determine how to weigh that evidence.
I made an off-the-cuff guess of "between 1% and 10%" and then did a Google search with only mild hope that this statistic would be available. I wasn't able today to re-find the pages I found then, but according to my recollection, my first search result was a page describing a survey of ~1k people claiming a ~75% rate of interest in BDSM, and my second search result was a page describing a survey of ~10k people claiming ~40% had participated in some form of BDSM and an additional ~40% were interested in trying it. I was also surprised to read (on the second page) that submission was more popular than dominance, masochism was more popular than sadism, and masochism remained more popular than sadism even if you only looked at males. Also, bisexuality was reportedly something like 5x higher within the BDSM-interested group than outside of it.)
If you're a moral realist, you can just say "Goodness" instead of "Human Values".
I notice I am confused. If "Goodness is an objective quality that doesn't depend on your feelings/mental state", then why would the things humans actually value necessarily be the same as Goodness?
Sure, give me meta-level feedback.