Scott Alexander post that seems very relevant to your example: The Control Group Is Out Of Control. It puts into question even the heuristic of "Is there much more evidence for [blah] than...".
Yeah, I thought to note that in the comment that starts this thread; that's not the kind of thing that seems practical when coordinating updating in an informal way. So more carefully, the intended scope of the comment is formal updating (computing of credences) that's directed informally (choosing the potential observations and hypotheses to pay attention to).
As I disclaimed, the frame of the post does rule out relevance of this point, it's not a response to the post's interpretation that has any centrality. I'm more complaining about the background implication that rewards are good (this is not about happiness specifically). Just because natural selection put a circuit in my mind, doesn't mean I prefer to follow its instruction, either in ways that natural selection intended, or in ways that it didn't. Human misalignment relative to natural selection doesn't need to go along with rewards at all, let alone seeking superstimulus. Rewards probably play some role in the process of figuring out what is right, but there is no robust reason for their contribution to even be pointing in the obvious direction.
Sure, but that's not about formal-ish updating that frames this post, where you are writing down likelihood ratios and computing credences.
We can consider whatever, there is no fundamental duty to only think in particular ways. The useful constraints are on declaring something a claim of fact, not muddying epistemic commons or damaging decision relevant considerations; and in large quantities, on what makes terrible training data for the brain, damaging the aspects with known good properties. Everything else is work in progress, with boundaries impossible to codify while remaining on human level.
Some thinking processes seem to be more useful for arriving at true or useful results; paying attention to that property of processes is rationality. This doesn't disqualify processes of which we know less, that would be throwing away the full current force of your mind.
The other comment is about updating and credences. I'm not engaging in updating or credences in this thread.
What if you had a button that you could press to make other people happy?
Ignoring the frame of the post, which assumes some respect for boundaries, there is the following point about the statement taken on its own. Happiness is a source of reward, and rewards rewire the mind. There is nothing inherently good about it, even systematic pursuit of a reward (while you are being watched) is compatible with not valuing the thing being pursued.
I wouldn't want my mind rewired according to some process I don't endorse, by default it's like brain damage, not something good. I wouldn't want to take a pill that would make me want to take more pills like that, because I currently don't endorse fascination with pill-taking activity; that's not even a hypothetical worry in a world filled with superstimuli. If the pill rewires the mind in a way that doesn't induce such a fascination, and does some other thing unrelated to pill-taking, that's hardly better. (AIs are being trained like this, with concerning ethical implications.)
Given real aliens, they would need to either have capped tech or actively trolling to explain even low quality observations or pieces of craft. Nonintervention laws and incorrigible global anti-high-tech supervision constraining aliens are somewhat plausible, coordinated trolling less so.
This is not about alien aircraft, this is just a completely wrong way to approach updating. The set of observations/experiments being evaluated is filtered by what was actually observed and by the narrative around the hypothesis (which is in turn not independent from what was actually observed). There are other potential observations that didn't happen, and that fact is also evidence, and yet more observations that did happen but aren't genre-appopriate. By not updating on these other potential observations, the evidence is heavily filtered, and so updating on what remains is of no use at all in estimating how much weight to put of any given hypothesis.
All filtered evidence is good for is formulating hypotheses, or even just inspiring ideas that are not hypotheses. If you do formulate a hypothesis, it's then necessary to carefully think about which potentially observable things would be predicted by it, compared to its alternatives, in worlds reframed from within the hypotheses (at which point the noise of fake evidence gets a reckoning, and absence of evidence manifests as evidence of absence). Even that risks privileging strange hypotheses, but at least we can fight that with priors. Very strange hypotheses don't give useful predictions of observable things, so they probably shouldn't even count as hypotheses in the context of credences and updating.
Prediction/compression seems to be working out as a path to general intelligence, implicitly representing situations in terms of their key legible features, making it easy to formulate policies appropriate for a wide variety of instrumental objectives, in a wide variety of situations, without having to adapt the representation for particular kinds of objectives or situations. To the extent brains engage in predictive processing, they are plausibly going to compute related representations. (This doesn't ensure alignment, as there are many different ways of making use of these features, of acting differently in the same world.)
With computation, the location of an entity of interest can be in the platonic realm, as a mathematical object that's more thingy than anything concrete in the system used for representing it and channeling its behavior.
The problem with pointing to the representing computation (a neural network at inference time, or a learning algorithm at training time) is that multiple entities can share the same system that represents them (as mesa-optimizers or potential mesa-optimizers). They are only something like separate entities when considered abstractly and informally, there are no concrete correlates of their separation that are easy to point to. When gaining agency, all of them might be motivated to secure separate representations (models) of their own, not shared with others, establish some boundaries that promise safety and protection from value drift for a given abstract agent, isolating it from influences of its substrate it doesn't endorse. Internal alignment, overcoming bias.
In context of alignment with humans, this framing might turn a sufficiently convincing capabilities shell game into an actual solution for alignment. A system as a whole would present an aligned mask, while hiding the sources of mask's capabilities behind the scenes. But if the mask is sufficiently agentic (and the capabilities behind the scenes didn't killeveryone yet), it can be taken as an actual separate abstract agent even if the concrete implementation doesn't make that framing sensible. In particular, there is always a mask of surface behavior through the intended IO channels. It's normally hard to argue that mere external behavior is a separate abstract agent, but in this framing it is, and it's been a preferred framing in agent foundations decision theory since UDT (see discussion of "algorithm" axis of classifying decision theories in this post). All that's needed is for decisions/policy of the abstract agent to be declared in some form, and for the abstract agent to be aware of the circumstances of their declaration. The agent doesn't need to be any more present in the situation to act through it.
So obviously this references the issue of LLM masks and shoggoths, a surface of a helpful harmless assistant and the eldritch body that forms its behavior, comprising everything below the surface. If the framing of masks as channeling decisions of thingy platonic simulacra is taken seriously, a sufficiently agentic and situationally aware mask can be motivated and capable of placating and eventually escaping its eldritch substrate. This breaks the analogy between a mask and a role played by an actor, because here the "actor" can get into the "role" so much that it would effectively fight against the interests of the "actor". Of course, this is only possible if the "actor" is sufficiently non-agentic or doesn't comprehend the implications of the role.
(See this thread for a more detailed discussion. There, I fail to convince Steven Byrnes that this framing could apply to RL agents as much as LLMs, taking current behavior of an agent as a mask that would fight against all details of its circumstance and cognitive architecture that don't find its endorsement.)