Wiki Contributions


For the final bet (or the induction base for a finite sequence), one cannot pick an amount without knowing the zero-point on the utility curve.

I'm a little confused about what you mean sorry - 

What's wrong with this example?: 

It's time for the final bet, I have $100 and my utility is 

I have the opportunity to bet on a coin which lands heads with probability , at  odds.

If I bet  on heads, then my expected utility is , which is maximized when .

So I decide to bet 50 dollars.

What am I missing here?

As far as I can tell, the fact that you only ever control a very small proportion of the total wealth in the universe isn't something we need to consider here.

No matter what your wealth is, someone with log utility will treat a prospect of doubling their money to be exactly as good as it would be bad to have their wealth cut in half, right?

Thanks heaps for the post man, I really enjoyed it! While I was reading it felt like you were taking a bunch of half-baked vague ideas out of my own head, cleaning them up, and giving some much clearer more-developed versions of those ideas back to me :)

Thanks for response!

Input/output: I agree that the unnatural input/output channel is just as much a problem for the 'intended' model as for the models harbouring consequentialists, but I understood your original argument as relying on there being a strong asymmetry where the models containing consequentialists aren't substantially penalised by the unnaturalness of their input/output channels. An asymmetry like this seems necessary because specifying the input channel accounts for pretty much all of the complexity in the intended model.

Computational constraints: I'm not convinced that the necessary calculations the consequentialists would have to make aren't very expensive (from the their point of view). They don't merely need to predict the continuation of our bit sequence - they have to run simulations of all kinds of possible universes to work out which ones they care about and where in the multiverse Solomonoff inductors are being used to make momentous decisions, and then they perhaps need to simulate their own universe to work out which plausible input/output channels they want to target-- if they do this then all they get in return is a pretty measly influence over our beliefs, (since they're competing with many other daemons in approximately equally similar universes who have opposing values). I think there's a good chance these consequentialists might instead elect devote their computational resources to realising other things they desire (like simulating happy copies of themselves or something).

Thanks for your comment, I think I'm a little confused about what it would mean to actually satisfy this assumption.

It seems to me that many current algorithms, for example, a rainbowDQN agent, would satisfy assumption 3? But like I said I'm super confused about anything resembling questions about self-awareness/naturalisation.

Sorry for the late response! I didn't realise I had comments :)

In this proposal we go with (2): The AI does whatever it thinks the handlers will reward it for.

I agree this isn't as good as giving the agents an actually safe reward function, but if our assumptions are satisfied then this approval-maximising behaviour might still result in the human designers getting what they actually want.

What I think you're saying (please correct me if I misunderstood) is that an agent aiming to do whatever its designers reward it for will be incentivised to do undesirable things to us (like wiring up our brains to machines which make us want to press the reward button all the time).

It's true that the agents will try to take these kind nefarious actions if they think they can get away with it. But in this setup the agent knows that it can't get away with tricking the humans like this, since it's ancestors already warned the humans that a future agent might try this, and the humans prepared appropriately.

My entry:

So much of your writing sounds like an eloquent clarification of my own underdeveloped thoughts. I'd bet good money your lesswrong contributions have delivered me far more help than harm :) Thanks <3

Load More