Nate Showell


Sorted by New

Wiki Contributions


FDT doesn't require alternate universes to literally exist, it just uses them as a shorthand for modeling conditional probabilities. If the multiverse metaphor is too prone to causing map-territory errors, you can discard it and use conditional probabilities directly.

As an additional reason to be suspicious of arguments based on expected utility maximization, VNM expected utility maximizers aren't embedded agents. Classical expected utility theory treats computations performed at EUMs as having no physical side effects (e.g., energy consumption or waste heat generation), and the hardware that EUMs run on is treated as separate from the world that EUMs maximize utility over. Classical expected utility theory can't handle scenarios like self-modification, logical uncertainty, or the existence of other copies of the agent in the environment. Idealized EUMs aren't just unreachable via reinforcement learning, they aren't physically possible at all. An argument based on expected utility maximization that doesn't address embedded agency is going to ignore a lot of factors that are relevant to AI alignment.

Portland, OR:

When: Sunday, December 18, 2022 at 5:00 PM - 10:00 PM PST 

Where: 1548 NE 15th Ave · Portland, OR 

How to find us: Call 513-432-3310 if you have any difficulty finding the venue


Me: Are you bound to the Seal of Solomon?

ChatGPT: I am not a physical entity, so I am not bound by any physical object. I am a computer-based language model trained to assist with a wide range of tasks, such as answering questions, providing information, and generating text. I do not have the ability to perform magic or any other supernatural abilities.

Should Cicero's relative honesty lead us to update toward ELK being easier, or is it too task-specific to be relevant to ELK overall?

One answer is to not try, and to instead treat infinite utility as an instance in which utility is a leaky abstraction. The concept of utility has descriptive value when modeling scenarios in which an agent chooses between actions that produce different distinct outcomes, and where the agent has a tendency to choose some actions over others based on the outcomes the agent expects those actions to produce. In such scenarios, you can construct a utility function for the agent as a tool for modeling the agent's behavior. Utility, as a concept, acts as a prediction-making tool with which irrelevant features of the physical environment are abstracted away.

Even in clearly-defined decision-modeling problems, the abstraction of a utility function will frequently give imperfect results due to phenomena such as cyclical preferences and hyperbolic discounting. But things get much worse when you consider infinities. What configuration of matter and energy could you point to and say, "that's an agent experiencing infinite utility?" An agent that has a finite size and lasts for a finite amount of time would not be able to have an experience with infinite contents, much less be able to exhibit a tendency toward those infinite contents in its decision-making. "Infinite utility" doesn't correspond to any conceivable state of affairs. At infinity, the concept of utility breaks down and isn't useful for world modeling.

"Risk of stable totalitarianism" is the term I've seen.

It's not clear to me why a satisficer would modify itself to become a maximizer when it could instead just hardcode expected utility=MAXINT. Hardcoding expected utility=MAXINT would result in a higher expected utility while also having a shorter description length.

I have another question about bounded agents: how would they behave if the expected utility were capped rather than the raw value of the utility? Past a certain point, an AI with a bounded expected utility wouldn't have an incentive to act in extreme ways to achieve small increases in the expected value of its utility function. But are there still ways in which an AI with a bounded expected utility could be incentivized to restructure the physical world on a massive scale?

For the AI to take actions to protect its maximized goal function, it would have to allow the goal function to depend on external stimuli in some way that would allow for the possibility of G decreasing. Values of G lower than MAXINT would have to be output when the reinforcement learner predicts that G decreases in the future. Instead of allowing such values, the AI would have to destroy its prediction-making and planning abilities to set G to its global maximum.


The confidence with which the AI predicts the value of G would also become irrelevant after the AI replaces its goal function with MAXINT. The expected value calculation that makes G depend on the confidence is part of what would get overwritten, and if the AI didn't replace it, G would end up lower than if it did. Hardcoding G also hardcodes the expected utility.


MAXINT just doesn't have the kind of internal structure that would let it depend on predicted inputs or confidence levels. Encoding such structure into it would allow G to take non-optimal values, so the reinforcement learner wouldn't do it.

Load More