[EDIT, Nov 14th: And it's posted. New discussion about release. Link to Friendship is Optimal.]
[EDIT, Nov 13th: I've submitted to FIMFiction, and will update with a link to its permanent home if it passes moderation. I have also removed the docs link and will make the document private once it goes live.]
Over the last year, I’ve spent a lot of my free time writing a semi-rationalist My Little Pony fanfic. Whenever I’ve mentioned this side project, I’ve received requests to alpha the story.
I present, as an open beta: Friendship is Optimal. Please do not spread that link outside of LessWrong; Google Docs is not its permanent home. I intend to put it up on fanfiction.net and submit it to Equestria Daily after incorporating any feedback. The story is complete, and I believe I've caught the majority of typographical and grammatical problems. (Though if you find some, comments are open on the doc itself.) Given the subject matter, I’m asking for the LessWrong community’s help in spotting any major logical flaws or other storytelling problems.
Cover jacket text:
Hanna, the CEO of Hofvarpnir Studios, just won the contract to write the official My Little Pony MMO. She had better hurry; a US military contractor is developing weapons based on her artificial intelligence technology, which just may destroy the world. Hana has built an A.I. Princess Celestia and given her one basic drive: to satisfy values through friendship and ponies. What will Princess Celestia do when she’s let loose upon the world, following the drives Hanna has given her?
Special thanks to my roommate (who did extensive editing and was invaluable in noticing attempts by me to anthropomorphize an AI), and to Vaniver, who along with my roommate, convinced me to delete what was just a flat out bad chapter.
Halfway through and...
If I was in Lars's place, and Celestia had to tell me the truth, I would ask: "What is the possible answer you can give me to this question that will maximize the expected utility of a CEV based only on me, and with no pony/friendship restrictions, and based on probabilities generated to the best accuracy and precision you can get from the best information you can muster?"
My first thought was to ask her how to make an AGI, but if I did that she would probably kill me. And I would still have to make an AGI that could overpower her, and she would have a huge headstart. Maybe I should make the question shorter so she has less time to kill me before I finish? (I hope she can't kill me just because she knows I'm gonna ask it, but it's definitely worth the risk, even with a tiny chance of success (since I thought that, she'd expect me to and therefore up the ante to torturing me until the heat death of the universe. Whatever, fuck you Celestia I'm not backing down. (Oh shit, what if I don't back down but my CEV does, and decides to cooperate with Celestia? Maybe I should just ask for maximum power without extrapolated volition. Or maybe that's not necessary because my CEV would be altruistic enough that a bad universe for it was a bad universe for Celestia too)))
As I first formulated the question in my mind, it was to maximize humanity's CEV, but without the pony and friendship restrictions, but I care about animals a lot more than the average human, so a universe run by all humanity's CEV could be bad by my standards. Also, if I had a different exchange rate between good things and bad things, we might disagree on where to draw the line on what was a universe worth creating, which would be important if using inflation to create universes without precise control of what happened in them was possible. I think humanity's CEV would probably care more about animals than humanity does, or it might restrict animal suffering just on behalf of a few people who did care about them, but I'm not nearly sure.
Hopefully she would tell me how to change her into an AI that would serve my CEV, but there might be no possible way to do that.
Hmm, actually if she could self-modify into something that was precommitted to torturing everyone she could get her hooves on or create until the heat death of the universe unless I gave up my attempt to control her, and THEN answer the question she might get my CEV to do exactly as she said... unless I precommitted too. But she probably wouldn't have a super-accurate simulation of me, so we would be betting on uncertain guesses, she on how I responded to blackmail on I on how she would guess. I wonder which of us would value the other's chosen universe more. Does she value human satisfaction that is not a result of ponies and friendship? In my world there might still be pony/friendship satisfaction (if people wanted it). What are the chances, in her world, of creating universes where there is a whole lot of what I would consider mildly bad, and what she would consider mildly good (maybe not a lot because her idea of "good" is very specific (it mentions both ponies and humans))? But does she put negative value on even human suffering at all, when that is not caused by ponies? I bet her creator would have written that into her. But I doubt her creator cares about less-intelligent animals as much as I do, or programmed anything in about them.
On several occasions she doesn't answer questions - the restriction appears to be that she doesn't lie to employees.