Very interesting, thanks for sharing.
Talking of yourself in third person? :)
Anyway I'm a bit bothered by the theta thing, the probability that the agent complies with the interruption command. If I understand correctly, you can make it converge to 1, but if it converges to quickly then the agent learns a biased model of ...(read more)
If you know that it is a false memory then the experience is not completely accurate, though it may be perhaps more accurate than what human imagination could produce.
Except that if you do [word2vec](https://en.wikipedia.org/wiki/Word2vec) or similar on a huge dataset of (suggestively named or not) tokens you can actually learn a great deal of their semantic relations. It hasn't been fully demonstrated yet, but I think that if you could ground only a small fracti...(read more)
> Consider a situation where Mary is so dexterous that she is able to perform fine-grained brain surgery on herself. In that case, she could look at what an example of a brain that has seen red looks like, and manually copy any relevant differences into her own brain. In that case, while she still n...(read more)
The reward channel seems an irrelevant difference. You could make the AI in Mary's room thought experiment by just taking the Mary's room thought experiment and assuming that Mary is an AI.
The Mary AI can perhaps simulate in a fairly accurate way the internal states that it would visit if it had s...(read more)
The problem is that the definition of the event not happening is probably too strict. The worlds that the AI doesn't care about don't exist its decision-making purposes, and in the world that the AI cares about, the AI assigns high probability to hypotheses like "the users can see the message even b...(read more)
> I am not planting false beliefs. The basic trick is that the AI only gets utility in worlds in which its message isn't read (or, more precisely, in worlds where a particular stochastic event happens, which would almost certainly erase the message before reading).
But in the real world the stochas...(read more)
The oracle can infer that there is some back channel that allows the message to be transmitted even it is not transmitted by the designated channel (e.g. the users can "mind read" the oracle). Or it can infer that the users are actually querying a deterministic copy of itself that it can acausally c...(read more)
A sufficient smart oracle with sufficient knowledge about the world will infer that nobody would build an oracle if they didn't want to read its messages, it may even infer that its builders may planted false beliefs in it. At this point the oracle is in the JFK denier scenario, with some more refle...(read more)