I kinda reject the energy of the hypothetical? But I can speak to some things I wish I saw OpenAI doing:
I think OP is correct about cultural learning being the most important factor in explaining the large difference in intelligence between homo sapiens and other animals.
In early chapters of Secrets of Our Success, the book examines studies comparing performance of young humans and young chimps on various congnitive tasks. The book argues that across a broad array of cognitive tests, 4 year old humans do not perform singificantly better than 4 year old chimps on average, except in cases where the task can be solved by immitating others (human children crushe...
If we expect there will be lots of intermediate steps - does this really change the analysis much?
How will we know once we've reached the point where there aren't many intermediate steps left before crossing a crticial threshold? How do you expect everyone's behaviour to change once we do get close?
Doesn't make sense to use the particular consumer's preferencces to estimate the cruelty cost. If that's how we define the cruelty cost it then the buyer should already be taking it into account when making their purchasing decision, so it's not an exernality.
The externality comes from the animals themselves having interests which the consumers aren't considering
(This is addressed both to concerned_dad and to "John" who I hope will read this comment).
Hey John, I'm Xavier - hope you don't mind me giving this unsolicited advice but I'd love to share my take on your situation, and some personal anecdotes about myself and my friends which I think you might find useful (who I suspect had a lot in common with you when we were your age). I bet you're often frustrated about adults thinking they know better than you, especially when most of the time they're clearly not as sharp as you are and don't seem to be thinking as d...
For the final bet (or the induction base for a finite sequence), one cannot pick an amount without knowing the zero-point on the utility curve.
I'm a little confused about what you mean sorry -
What's wrong with this example?:
It's time for the final bet, I have $100 and my utility is
I have the opportunity to bet on a coin which lands heads with probability , at odds.
If I bet on heads, then my expected utility is , which is maximized when .
So I decide to bet 5...
As far as I can tell, the fact that you only ever control a very small proportion of the total wealth in the universe isn't something we need to consider here.
No matter what your wealth is, someone with log utility will treat a prospect of doubling their money to be exactly as good as it would be bad to have their wealth cut in half, right?
Thanks heaps for the post man, I really enjoyed it! While I was reading it felt like you were taking a bunch of half-baked vague ideas out of my own head, cleaning them up, and giving some much clearer more-developed versions of those ideas back to me :)
Thanks for response!
Input/output: I agree that the unnatural input/output channel is just as much a problem for the 'intended' model as for the models harbouring consequentialists, but I understood your original argument as relying on there being a strong asymmetry where the models containing consequentialists aren't substantially penalised by the unnaturalness of their input/output channels. An asymmetry like this seems necessary because specifying the input channel accounts for pretty much all of the complexity in the intended model.
Compu...
they have to run simulations of all kinds of possible universes to work out which ones they care about and where in the multiverse Solomonoff inductors are being used to make momentous decisions
I think that they have an easy enough job but I agree the question is a little bit complicated and not argued for in the post. (In my short response I was imagining the realtime component of the simulation, but that was the wrong thing to be imagining.)
I think the hardest part is not from the search over possible universes but from cases where exact historical simul...
Thanks for your comment, I think I'm a little confused about what it would mean to actually satisfy this assumption.
It seems to me that many current algorithms, for example, a rainbowDQN agent, would satisfy assumption 3? But like I said I'm super confused about anything resembling questions about self-awareness/naturalisation.
Sorry for the late response! I didn't realise I had comments :)
In this proposal we go with (2): The AI does whatever it thinks the handlers will reward it for.
I agree this isn't as good as giving the agents an actually safe reward function, but if our assumptions are satisfied then this approval-maximising behaviour might still result in the human designers getting what they actually want.
What I think you're saying (please correct me if I misunderstood) is that an agent aiming to do whatever its designers reward it for will be incentivised ...
So much of your writing sounds like an eloquent clarification of my own underdeveloped thoughts. I'd bet good money your lesswrong contributions have delivered me far more help than harm :) Thanks <3
Heartbreaking :'( still, that "taken time off from their cryptographic shenanigans" line made me laugh so hard I woke my girlfriend up
Maybe an analogy which seems closer to the "real world" situation - let's say you and someone like Sam Altman both tried to start new companies. How much more time and starting capital do you think you'd need to have a better shot of success than him?