All of Nate Showell's Comments + Replies

Relatedly, humans are very extensively optimized to predictively model their visual environment. But have you ever, even once in your life, thought anything remotely like "I really like being able to predict the near-future content of my visual field. I should just sit in a dark room to maximize my visual cortex's predictive accuracy."?

n=1, but I've actually thought this before.

5Quintin Pope2d
I added the following to the relevant section: On reflection, the above discussion overclaims a bit in regards to humans. One complication is that the brain uses internal functions of its own activity as inputs to some of its reward functions, and some of those functions may correspond or correlate with something like "visual environment predictability". Additionally, humans run an online reinforcement learning process, and human credit assignment isn't perfect. If periods of low visual predictability correlate with negative reward in the near-future, the human may begin to intrinsically dislike being in unpredictable visual environments. However, I still think that it's rare for people's values to assign much weight to their long-run visual predictive accuracy, and I think this is evidence against the hypothesis that a system trained to make lots of correct predictions will thereby intrinsically value making lots of correct predictions.

Simulacrum level 4 is more honest than level 3. Someone who speaks at level 4 explicitly asks himself "what statement will win me social approval?" Someone who speaks at level 3 asks herself the same question, but hides from herself the fact that she asked it.

Simulacra levels aren't a particularly good model for some interactions/topics, because they blend together in idiosyncratic ways.  It's unspecified in the model whether it's intentional or not, or whether a speaker is hiding anything from themselves, vs cynically understanding the levels and using the one that they think is most effective at the time.
2the gears to ascension12d
I don't think so. simulacrum 4 has trouble making coherent reference to physical causal trajectory. simulacrum 3 and 1 are compatible in fact, in some circumstances. not so with 4
I felt I was saying "Simulacrum Level 1: Attempt to describe the world accurately." The AI was rushed, misaligned, and not a good launch for its users. More people need to know that. It's literally already how NYT and others are describing it (accurately) I'm just suggesting signal boosting that content.

I agree about embedded agency. The way in which agents are traditionally defined in expected utility theory requires assumptions (e.g. logical omniscience and lack of physical side effects) that break down in embedded settings, and if you drop those assumptions you're left with something that's very different from classical agents and can't be accurately modeled as one. Control theory is a much more natural framework for modeling reinforcement learner (or similar AI) behavior than expected utility theory.

ChatGPT doesn't always exhibit anomalous behavoir for the token DragonMagazine:

Although it still sometimes does:

Quotation marks appear to affect whether it handles the string correctly:

This is because of tokenization. Tutorial about BPE (which OpenAI use) is here. Specifically in this case:

FDT doesn't require alternate universes to literally exist, it just uses them as a shorthand for modeling conditional probabilities. If the multiverse metaphor is too prone to causing map-territory errors, you can discard it and use conditional probabilities directly.

I would argue that to actually get benefit out of some of these formal dilemmas as they're actually framed, you have to break the rules of the formal scenario and say the agent that benefits is the global agent, who then confers the benefit back down onto the specific agent at a given point in logical time. However, because we are already at a downstream point in logical time where the FDT-unlikely/impossible scenario occurs, the only way for the local agent to access that counterfactual benefit is via literal time travel. From the POV of the global agent, asking the specific agent in the scenario to let themselves be killed for the good of the whole makes sense, but if you clamp agent to the place in logical time where the scenario begins and ends, there is no benefit to be had for the local agent within the runtime of the scenario.

As an additional reason to be suspicious of arguments based on expected utility maximization, VNM expected utility maximizers aren't embedded agents. Classical expected utility theory treats computations performed at EUMs as having no physical side effects (e.g., energy consumption or waste heat generation), and the hardware that EUMs run on is treated as separate from the world that EUMs maximize utility over. Classical expected utility theory can't handle scenarios like self-modification, logical uncertainty, or the existence of other copies of the agent... (read more)

Portland, OR:

When: Sunday, December 18, 2022 at 5:00 PM - 10:00 PM PST 

Where: 1548 NE 15th Ave · Portland, OR 

How to find us: Call 513-432-3310 if you have any difficulty finding the venue


Me: Are you bound to the Seal of Solomon?

ChatGPT: I am not a physical entity, so I am not bound by any physical object. I am a computer-based language model trained to assist with a wide range of tasks, such as answering questions, providing information, and generating text. I do not have the ability to perform magic or any other supernatural abilities.

Should Cicero's relative honesty lead us to update toward ELK being easier, or is it too task-specific to be relevant to ELK overall?

I think it's irrelevant to ELK. Cicero's honesty is hardcoded in fixed architecture, it's not latent at all.

One answer is to not try, and to instead treat infinite utility as an instance in which utility is a leaky abstraction. The concept of utility has descriptive value when modeling scenarios in which an agent chooses between actions that produce different distinct outcomes, and where the agent has a tendency to choose some actions over others based on the outcomes the agent expects those actions to produce. In such scenarios, you can construct a utility function for the agent as a tool for modeling the agent's behavior. Utility, as a concept, acts as a predi... (read more)

"Risk of stable totalitarianism" is the term I've seen.

It's not clear to me why a satisficer would modify itself to become a maximizer when it could instead just hardcode expected utility=MAXINT. Hardcoding expected utility=MAXINT would result in a higher expected utility while also having a shorter description length.

That's true hehe, but that also seems bad.

I have another question about bounded agents: how would they behave if the expected utility were capped rather than the raw value of the utility? Past a certain point, an AI with a bounded expected utility wouldn't have an incentive to act in extreme ways to achieve small increases in the expected value of its utility function. But are there still ways in which an AI with a bounded expected utility could be incentivized to restructure the physical world on a massive scale?

This is a satisficer and Rob Miles talks about it in the video.
Yeah, I had a similar thought with capping both the utility and the percent chance, but maybe capping expected utility is better. Then again, maybe we've just reproduced quantization.

For the AI to take actions to protect its maximized goal function, it would have to allow the goal function to depend on external stimuli in some way that would allow for the possibility of G decreasing. Values of G lower than MAXINT would have to be output when the reinforcement learner predicts that G decreases in the future. Instead of allowing such values, the AI would have to destroy its prediction-making and planning abilities to set G to its global maximum.


The confidence with which the AI predicts the value of G would also become irrelevant af... (read more)