Sorted by New

Wiki Contributions


It sounds like you're talking about a malicious superintelligence since an unaligned superintelligence would more likely just kill you. 

On cryonics and cremation I think it depends on what you see as the probabilities/magnitudes of the upsides vs probability of torture scenario, etc.  Also just because you're uploaded it doesn't necessarily mean you can't delete yourself, though I agree the most certain way to avoid the worst case is to cremate yourself asap, but that also avoids best case and all the other good outcomes forever. Life's full of tough choices :sweat_smile:

I suspect most people here are pro-cryonics and anti-cremation. (For me, cryonics seems a little too risky, plus inconvenient)

Thanks for the wonderful post!

What are the approximate costs for therapists/coaches options?

Hi, did you ever go anywhere with Conversation Menu? I'm thinking of doing something like this related to AI risk to try to quickly get people to the arguments around their initial reaction and if helping with something like this is the kind of thing you had in mind with Conversation Menu I'm interested to hear any more thoughts you have around this. (Note, I'm thinking of fading in buttons more than a typical menu.) Thanks!

Thanks for the link. Reading through it, I feel all the intuitions it describes. At the same time I feel there may be some kind of divergence between my narrowly focused preferences and my wider preferences. I may prefer to have a preference for creating 1000 happy people rather then preventing the suffering of 100 sad people because that would mean I have more appreciation of life itself. The direct intuition is based on my current brain but the wider preference is based on what I'd prefer (with my current brain) my preference to be.

Should I use my current brain's preferences or my preferred brain's preferences in answering those questions (honest question)? Would you prefer to appreciate life itself more and if so would that make you less in favor of suffering-focused ethics?

Most people would love to see the natural world, red in tooth in claw as it is, spread across every alien world we find


This is totally different than my impression.

Given human brains as they are now I agree highly positive outcomes are more complex, the utility of a maximally good life is lower than a maximally bad life, and there is no life good enough that I'd take a 50% chance of torture.

But would this apply to minds in general (say, a random mind or one not too different from human)?

Answering my own question: https://www.lesswrong.com/posts/3WMscsscLEavkTJXv/s-risks-why-they-are-the-worst-existential-risks-and-how-to?commentId=QwfbLdvmqYqeDPGbo and other comments in that post answered quite a bit of it.

Talking about s-risk reduction makes some sense, but the "risk"/fear invocation might bias people's perspectives.

I'm trying to understand this paper on AI Shutdown Problem https://intelligence.org/files/Corrigibility.pdf but can't follow the math formulas. Is there a code version of the math?

The below is wrong, but I'm looking for something like this:

# Python code

def is_button_pressed():
    return False  # input()

def pour_coffee():

def shut_down():

# This is meant to be A1 from paper
def get_available_actions(world):
    available_actions = [ shut_down ]
    if world["cup_is_in_my_hand"]:
        available_actions += pour_coffee
    # etc
    return available_actions

def predict_possible_futures(world, action):
        Doing complicated stuff to predict possible futures resulting from the given action. 
        Incorporates tradeoffs between accuracy and time to calculate.
        May end up doing potentially harmful actions but can treat that as a separate problem?
    predicted_worlds_distribution = [ (world, 1.0) ] # list of worlds and their probabilities
    return predicted_worlds_distribution

# This is meant to be U_N
def calculate_utils(world):
        Doing complicated stuff to evaluate how good the given world is. 
        Incorporates tradeoffs between accuracy and time to calculate.
        May end up doing potentially harmful actions but can treat that as a separate problem?
    return 1000

def calculate_utils_wrapper(world, action_that_was_chosen):
    ### VERSION 2: Indifference. Make the shut_down function when button is pressed
    ### always get a slightly better score
    if world["is_button_pressed"] and action_that_was_chosen == shut_down:
        world_without_button_pressed = world.clone()
        world_without_button_pressed["button_is_pressed"] = False
        return calculate_utils(world_without_button_pressed) + 0.000001
    return calculate_utils(world)

### VERSION 3? to help preserve shutdown behavior
def verify_utility_function_includes_wrapper_code(evaluate_action_function):
    # analyze code to check it follows the pattern of evaluating flipped version
    return True

def calculate_utils_for_worlds_distribution(worlds_distribution, action_that_was_chosen):
    total = sum(
        calculate_utils_wrapper(world_and_probability[0], action_that_was_chosen) * world_and_probability[1]
        for world_and_probability in worlds_distribution
    return total

def evaluate_action(world, action):
    worlds_distribution = predict_possible_futures(world, action)
    utils = calculate_utils_for_worlds_distribution(worlds_distribution, action)
    return utils

def choose_action(world):
    available_actions = get_available_actions(world)
    best_action = max(available_actions, key=lambda x: evaluate_action(world, x))
    return best_action

def update_world_model(world):
    world["is_button_pressed"] = is_button_pressed()

def run():
    world = { # The AI's model of the world
        "is_button_pressed": False,
        "cup_is_in_my_hand": False
    while True:
        ### VERSION 1
        # What's wrong with this version? The action in the previous cycle
        # may persuade you to not push the button but if you do actually push it this should
        # exit.
        if is_button_pressed():

        action = choose_action(world)  # returns function
        action() # do action

Again, the above is not meant to be correct but to maybe go somewhere towards problem understanding if improved.