This post is great! I love the visualizations. And I hadn't made the explicit connection between iterated convolution and CLT!
I don't think so.
What I am describing is an strategy to manage your efforts in order to spend as little as possible while still meeting your goals (when you do not know in advance how much effort will be needed to solve a given problem).
So presumably if this heuristic applies to the problems you want to solve, you spend less on each problem and thus you'll tackle more problems in total.
I think this helped me a lot understand you a bit better - thank you
Let me try paraphrasing this:
> Humans are our best example of a sort-of-general intelligence. And humans have a lazy, satisfying, 'small-scale' kind of reasoning that is mostly only well suited for activities close to their 'training regime'. Hence AGIs may also be the same - and in particular if AGIs are trained with Reinforcement Learning and heavily rewarded for following human intentions this may be a likely outcome.Is that pointing in the direction you intended?
(I realized I miseed the part on the instructions about an empty room - so my solutions involve other objects)
Let me try to paraphrase this:
In the first paragraph you are saying that "seeking influence" is not something that a system will learn to do if that was not a possible strategy in the training regime. (but couldn't it appear as an emergent property? Certainly humans were not trained to launch rockets - but they nevertheless did?)
In the second paragraph you are saying that common sense sometimes allows you to modify the goals you were given (but for this to apply to AI ststems, wouldn't they need have common sense in the first place, which kind of assumes that the AI is already aligned?)
In the third paragraph it seems to me that you are saying that humans have some goals that have an built-in override mechanism in them - eg in general humans have a goal of eating delicious cake, but they will forego this goal in the interest of seeking water if they are about ot die of dehydratation (but doesn't this seem to be a consequence of these goals being just instrumental things that proxy the complex thing that humans actually care about?)
I think I am confused because I do not understand your overall point, so the three paragraphs seem to be saying wildly different things to me.
I notice I am surprised you write
However, the link from instrumentally convergent goals to dangerous influence-seeking is only applicable to agents which have final goals large-scale enough to benefit from these instrumental goals
and not address the "Riemman disaster" or "Paperclip maximizer" examples 
Riemann hypothesis catastrophe. An AI, given the final goal of evaluating the Riemann hypothesis, pursues this goal by transforming the Solar System into “computronium” (physical resources arranged in a way that is optimized for computation)— including the atoms in the bodies of whomever once cared about the answer.Paperclip AI. An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacture of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips.
Do you think that the argument motivating these examples is invalid?
Do you disagree with the claim that even systems with very modest and specific goals will have incentives to seek influence to perform their tasks better?
Thank you for pointing this out!
I have a sense that that log-odds are an underappreciated tool, and this makes me excited to experiment with them more - the "shared and distinct bits of evidence" framework also seems very natural.
On the other hand, if the Goddess of Bayesian evidence likes log odds so much, why did she make expected utility linear on probability? (I am genuinely confused about this)
I had not realized, and this makes so much sense.
Paul Christiano has explored the framing of interactive proofs before, see for example this or this.
I think this is a exciting framing for AI safety, since it gets to the crux of one of the issues as you point out in your question.