See section 2 of this Agent Foundations research program and citations for discussion of the problems of logical uncertainty, logical counterfactuals, and the Löbian obstacle. Or you can read this friendly overview. Gödel-Löb provability logic has been used here.
I don't know of any application of set theory to agent foundations research. (Like large cardinals, forcing, etc.)
Ah, 90% of the people discussed on this post are now working for Anthropic, along with a few other ex-OpenAI safety people.
Here's a fun and pointless way one could rescue the homunculus model: There's an infinite regress of homunculi, each of which sees a reconstructed image. As you pass up the chain of homunculi, the shadow gets increasingly attenuated, approaching but never reaching complete invisibility. Then we identify "you" with a suitable limit of the homunculi, and what you see is the entire sequence of images under some equivalence relation which "forgets" how similar A and B were early in the sequence, but "remembers" the presence of the shadow.
The homunculus model says that all visual perception factors through an image constructed in the brain. One should be able to reconstruct this image by asking a subject to compare the brightness of pairs of checkerboard squares. A simplistic story about the optical illusion is that the brain detects the shadow and then adjusts the brightness of the squares in the constructed image to exactly compensate for the shadow, so the image depicts the checkerboard's inferred intrinsic optical properties. Such an image would have no shadow, and since that's all the homunculus sees, the homunculus wouldn't perceive a shadow.
That story is not quite right, though. Looking at the picture, the black squares in the shadow do seem darker than the dark squares outside the shadow, and similarly for the white squares. I think if you reconstructed the virtual image using the above procedure you'd get an image with an attenuated shadow. Maybe with some more work you could prove that the subject sees a strong shadow, not an attenuated one, and thereby rescue Abram's argument.
Edit: Sorry, misread your comment. I think the homunculus theory is that in the real image, the shadow is "plainly visible", but the reconstructed image in the brain adjusts the squares so that the shadow is no longer present, or is weaker. Of course, this raises the question of what it means to say the shadow is "plainly visible"...
This is the sort of problem Dennett's Consciousness Explained addresses. I wish I could summarize it here, but I don't remember it well enough.
It uses the heterophenomenological method, which means you take a dataset of earnest utterances like "the shadow appears darker than the rest of the image" and "B appears brighter than A", and come up with a model of perception/cognition to explain the utterances. In practice, as you point out, homunculus models won't explain the data. Instead the model will say that different cognitive faculties will have access to different pieces of information at different times.
Very interesting. I would guess that to learn in the presence of spoilers, you'd need not only a good model of how you think, but also a way of updating the way you think according to the model's recommendations. And I'd guess this is easiest in domains where your object-level thinking is deliberate rather than intuitive, which would explain why the flashcard task would be hardest for you.
When I read about a new math concept, I eventually get the sense that my understanding of it is "fake", and I get "real" understanding by playing with the concept and getting surprised by its behavior. I assumed the surprise was essential for real understanding, but maybe it's sufficient to track which thoughts are "real" vs. "fake" and replace the latter with the former.
Have you had any success learning the skill of unseeing?
See also this comment from 2013 that has the computable version of NicerBot.
This algorithm is now published in "Robust program equilibrium" by Caspar Oesterheld, Theory and Decision (2019) 86:143–159, https://doi.org/10.1007/s11238-018-9679-3, which calls it ϵGroundedFairBot.
The paper cites this comment by Jessica Taylor, which has the version that uses reflective oracles (NicerBot). Note also the post by Stuart Armstrong it's responding to, and the reply by Vanessa Kosoy. The paper also cites a private conversation with Abram Demski. But as far as I know, the parent to this comment is older than all of these.
Or maybe it means we train the professional in the principles and heuristics that the bot knows. The question is if we can compress the bot's knowledge into, say, a 1-year training program for professionals.
There are reasons to be optimistic: We can discard information that isn't knowledge (lossy compression). And we can teach the professional in human concepts (lossless compression).