Chris_Leong

Sequences

Linguistic Freedom: Map and Territory Revisted
INVESTIGATIONS INTO INFINITY

Wiki Contributions

Load More

Comments

What are the key philosophical problems you believe we need to solve for alignment?

I'd encourage you to write up a blog post on common mistakes if you can find the time.

Yeah... I suppose you could go through Evan Hubringer's arguments in "How likely is deceptive alignment?", but I suppose you'd probably have some further pushback which would be hard to answer.

Why do you think that the number of people who could make a convincing case to you is so low? Where do they normally mess up?


My position on Newcomb's Problem in a sentence: Newcomb's paradox results from attempting to model an agent as having access to multiple possible choices, whilst insisting it has a single pre-decision brain state.

Minor correction

But then, in the small fraction of worlds where we survive, we simulate lots and lots of copies of that AI where it instead gets reward 0 when it attempts to betray us!

The reward should be negative rather than 0.

Ah sorry, I somehow forgot you could put your money in one box and bet on another box.

Regarding the second problem, there's a nash equilibrium where two agents bomb box 1 while betting on box 2 and two other agents bomb box 2 while betting on box 1. No agent can unilaterally change its strategy to score more.

[This comment is no longer endorsed by its author]Reply

Oh, silly me. Of course, EDT doesn't need to pre-commit b/c EDT just does whatever gives the highest expected value without caring about whether there's a causal impact or not. So when given the decision of whether to weigh the boxes in pairs vs. the decision of weighing them all and picking the heaviest, it's happy to weigh in pairs because that increases how much money it expects to win.

Wasn't EDT pre-committing to the strategy of weighing the left two boxes and the right two boxes and then deciding to randomly pick one of the heavier pair? Or are you saying that a blinded EDT automatically adopts this strategy without precommitment?

This comment was edited to fix swapping "lighter" and "heavier".

[This comment is no longer endorsed by its author]Reply
Load More