Got it. It's more of an assumption than known to be difficult. Personally, I suspect that it's not a fundamental barrier given how good humans are at good at chunking concepts into layers of abstraction that can be communicated much more easily than carefully comparing entire models of the world.
This sounds like the common thread to me. Humans tend to become proficient at things they spend a large amount of time doing, especially if it is something that requires active participation and generates utils. One does not have to be forcibly immersed to gain new proficiencies so long as becoming more proficient is sufficiently useful that the benefits gained exceed the effort required, especially when the benefit is realized in the short term.
To become proficient in math and truly internalize it, you must be able to use it like language, i.e., as a tool to interact with or understand the universe in a way that aligns with your utility function. Much like many people don't often find it useful to be fluent in Hungarian, many people don't find it useful to be fluent in mathematics. It just isn't high-value enough to devote the time to learning.
Being high value isn't sufficient on its own though. There has to be the opportunity to learn and improve over time, ideally at the boundary of one's comfort zone instead of entirely outside or wholly within. The ocean does not allow much opportunity to a non-swimmer. Life is the ultimate prerequisite to learning anything.
I don't quite fully grasp why world-model divergence is inherently so problematic unless there is some theorem that says robust coordination is only possible with full synchronization. Is there something preventing the possibility of alignment among agents with significantly divergent world models?
I just want to say thanks for taking the time to address my comments. The proof given for the Troll Problem still feels wrong to me but you're helping me get closer to finding the core of where my intuition is inconsistent with it. I'll consider what you've pointed out and re-examine the problem with a bit more clarity.
This was enlightening for me. I suspect the concept of treating agents (artificial, human, or otherwise) as multiple interdependent subsystems working in coordination, each with its own roles, goals, and rewards, rather than as a single completely unified system is critical for solving alignment problems.
I recently read Entangled Life (by Merlin Sheldrake), which explores similar themes. One of the themes is that the concept of the individual is not so easily defined (perhaps not even entirely coherent). Every complex being is made up of smaller systems and also part of a larger ecosystem and none of these levels can truly be understood independently of the others.
I'm objecting to an assumption that contradicts a previous assumption which leads to inconsistent PA. If PA is consistent, then we can't just suppose 1 = 2 because we feel like it. If 1 = 2, then PA is inconsistent and math is broken.
Assuming PA is consistent and assuming the agent has crossed, then U for crossing cannot be -10 or the agent would not have crossed. Weird assumptions like "1 = 2" or "the agent proves crossing implies U = -10" contradict the existing assumption of PA consistency. Any proof that involves inconsistency of PA as a key step is immediately suspect.
Let's take a look at the proof using a simpler assumption.
Suppose the agent crosses.
Further suppose the agent proves crossing implies 1 = 2.
Such a proof means PA is inconsistent, so 1 = 2 is indeed "true" within PA.
Thus "agent proves crossing implies 1 = 2" implies 1 = 2.
Therefore by Löb's theorem, crossing implies 1 = 2.
This is just a fancy way of abusing Löb's theorem to bootstrap a vacuously true statement by applying it to an inconsistent axiom system. Within consistent PA, this proof is nonsense. With inconsistent PA, every proof in PA is nonsense, so it doesn't make sense to make assumptions that lead to inconsistent PA. If the proof leads to "PA is inconsistent", then every step that follows from that step is unreliable.
In my opinion, the whole Troll Bridge Problem boils down to supposing False = True and then trying to apply logical reasoning to the resulting inconsistent system. Of course you get a paradox when you suppose U = -10 and U >= 0 simultaneously.
I'm having trouble accepting that the Troll Bridge scenario is well-posed as opposed to a Russell-like paradox. Perhaps someone can clarify what I'm missing.
In my mind, there are two options:
If PA is inconsistent, then math is in ruins and any PA-based reasoning for crossing the bridge could be inconsistent and the troll blows up the bridge. Do not cross.
If PA is consistent, then the agent cannot prove U = -10 (or anything else inconsistent) under the assumption that the agent already crossed, and therefore Löb's theorem fails to apply. In this case, there is no weird certainty that crossing is doomed.
Now until/unless PA is proven inconsistent, it's reasonable to assign the majority of probability mass to the prior that PA is, in fact, consistent and we can ignore counterfactuals that depend on proving otherwise since if that's proven, none of the rest of the reasoning matters anyway until foundational logic has been reformulated on a consistent basis.
My priors include the idea that both animal intelligence is not that different from humans and also that humans tend to overly anthropomorphize animal cognition. The biggest misunderstandings of animal cognition are much like misunderstandings humans have of foreign cultures, often involving forms of mind projection fallacies where we assume other's values, motivations, priorities, and perceptions are more similar (or more different) than is justified.
There are some pretty major differences between game economies and real life that would need to be carefully considered.
Motivation: People generally play games voluntarily for purposes of enjoyment and can stop when they get bored or fed up. Giving up on real life is entirely different. Games closer to reality will likely suffer from loss of interest.
Difficulty: Surviving and thriving in most MMRPGs is much simpler than reality where (for most people) huge amounts of time and energy are spent earning money for the basics of living (rent, food, bills) and engaged in bodily necessities like eating and sleeping. Games without serious consequences for failing to meet basic monetary and biological needs will miss huge categories of economic incentives (e.g. wage slavery).
Growth: Games tend to have monotonically increasing levels/stats for player characteristics. Reality is so much more frustrating. You lose strength if you stop training or get injured. You forget things you've learned, especially if you don't practice. Growth in the real world is longer and harder and naturally decays over time even when there aren't major regressions like an injury. Real-world growth tends to be asymptotically limited by individual aptitudes determined by things like the genetic lottery and easily hampered by arbitrary local environment and corresponding access to resources.
With regard to the smiley face problem, you can keep the other signifiers of tone and just phase out the emoticon so that it's "That's awesome. Thanks!" rather than a more passive "thanks".