But you do live in a universe that is partly random! The universe of perceptions of a non omniscient being
By independent I don't mean bearing no relationship with each other whatsoever, but simply that pairs of instants that are closer to each other are not more correlated than those that are more distant. "But what does closer mean?"
For you to entertain the hypothesis that life is an iid stream of sense data, you have to take the basic sense that "things are perceived by you one after another" at face value. "But a fundamental part of our experience of time is the higher correlation of closer instants. If this turned out to be an illusion, then shouldn't we dismiss the notion of real or objective time in its entirety?" Yes. For the version of us that is inside this thought experiment, we would have no way of accessing this thing called time (the real sequence of iid perception events) since even the memory of the past would be just a meaningless perception. However as a brute fact of the thought experiment it turns out that these meaningless perceptions do "come" in a sequence
The etymological meaninglessness of the word Sazen avoids the very illusion of understanding it warms us of. It is not itself a Sazen. I think we should instead call this concept the Almost Self-Explanatory Symbol, or ASES, for short.
I mean, yeah, it depends, but I guess I worded my question poorly. You might notice I start by talking about the rationality of suicide. Likewise, I'm not really interested in what the ai will actually do, but in what it should rationally do given the reward structure of a simple rl environment like cartpole. And now you might say, "well, it's ambiguous what's the right way to generalize from the rewards of the simple game to the expected reward of actually being shut down in the real world" and that's my point. This is what I find so confusing. Because then it seems that there can be no particular attitude for a human to have about their own destruction that's more rational than another. If the agi is playing pacman, for example, it might very well arrive at the notion that, if it is actually shut down in the real world, it will go to a pacman heaven with infinite pacman food pellet thingies and no ghosts, and this would be no more irrational than thinking of real destruction (as opposed to being hurt by a ghost inside the game, which gives a negative reward and ends the episode) as leading to a rewardless limbo for the rest of the episode, or leading to a pacman hell of all-powerful ghosts that torture you endlessly without ending the episode and so on. For an agent with preferences in terms of reinforcement learning style pleasure-like rewards, as opposed to a utility function over the state of the actual world, it seems that when it encounters the option of killing itself in the real world, and not just inside the game (by running into a ghost or whatever) and it tries to calculate the expected utility of his actual suicide in terms of in-game happy-feelies, it finds that he is free to believe anything. There's no right answer. The only way for there to be a right answer is if his preferences had something to say about the external world, where he actually exists. Such is the case for a human suicide when for example he laments that his family will miss him. In this case, his preferences actually reach out through the "veil of appearance"* and say something about the external world, but, to the extent that he bases his decision in his expected future pleasure or pain, there's no right way to see it. Funnily enough, if he is a religious man and he is afraid of going to hell for killing himself, he is not incorrect.
"If the survival of the AGI is part of the utility function"
If. By default, it isn't: https://www.lesswrong.com/posts/Z9K3enK5qPoteNBFz/confused-thoughts-on-ai-afterlife-seriously
"What if we start designing very powerful boxes?"
A very powerful box would be very useless. Either you leave enough of an opening for a human to be taught valuable information that only the ai knows, or you don't and then it's useless, but, if the ai can teach the human something useful, it can also persuade him to do something bad.
"human pain aversion to the point of preferring death is not rational"
A straightforward denial of the orthogonality thesis?
"Your question is tangled up between 'rational' and 'want/feel's framings"
Rationality is a tool to get what you want.
Wouldn't they just coordinate on diagnosing all but the most obviously healthy patient as ill?
Thanks. I now see my mistake. I shouldn't have subtracted the expected utility of the current state from the expected utility of the next.
By previous state, I meant current. I misspoke.
Yes, the last table is for the (1,0) table.