"by the time the AI is smart enough to do that, it will be smart enough not to"
I still don't quite grasp why this isn't an adequate answer. If an FAI shares our CEV, it won't want to simulate zillions of conscious people in order to put them through great torture, and it will figure out how to avoid it. Is it simply that it may take the simulated torture of zillions for the FAI to figure this out? I don't see any reason to think that we will find this problem very much easier to solve than a massively powerful AI.
I'm also not wholly convinced that the only ethical way to treat simulacra is never to create them, but I need to think about that one further.
Posting much too late, but about the Emperor's New Clothes: I had always interpreted the story to mean not that people stopped believing in the clothes when the little boy spoke out, but that it hadn't quite crossed people's minds until that moment that whether or not the King was clothed, he appeared naked to that boy, and that wasn't how things should be. Everyone laughs because they think, I too can see the nakedness of the King; only then do they realise that their neighbour can also see it, and only after that do they see that there are no clothes to see.
Very recently experienced exactly this phenomenon: someone discussing atheists who think "all religion/religious stuff is bad" to the inclusion of, for example, the music of Bach, or drinking and celebrating at Christmas. They seemed convinced that such atheists exist, and I doubt it, or at least I have never heard of them or met them, and I know for a fact that for example all four horsemen of atheism have made explicit statements to the contrary.
Your disclaimer is an annoying one to have to make, and of course this problem comes up whenever this move is made in discussion; your counterpart says "well, but some singularitarians believe that, don't they?" and you can't actually prove there are none, and you have the sneaking fear that given the vastness of the Internet a judicious Google search might just turn up someone crazy enough to vindicate them; but of course a handful of anonymous loons on the Internet sought specifically for their particular brand of madness does not a position worthy of discussion make.
There is room to do vastly better than what is usually used for community content finding, and it's a great mystery to me how little explored this area is. If things have moved forward significantly since Raph Levien's work on attack resistant trust metrics, I haven't heard about it.
Good software to support rational discussion would be a huge contribution to thought.
There's a whole world of atheist blogging and writing out there that might also be worth tapping into for advice from others who've been there. See this collection of deconversion stories for example.
That sounds like a really tough spot. I hope you find advice that can help.
Possession of a single Eye is said to make the bearer equivalent to royalty.
Crossman: there's a third argument, which is that even if the consequences of keeping the secret are overall worse than those of betraying the confidence even after the effect you discuss, turning yourself into someone who will never betray these secrets no matter what the consequences and advertising yourself as such in an impossible-to-fake way may overall have good consequences. In other words, you might turn away from consequentialism on consequentialist grounds.
Another example where unfakeably advertising irrationality can (at least in theory) serve you is threats. My only way of stopping you from taking over the world is that I have the power to destroy the world and you. Now, if you take over the world, there's no possible advantage to destroying it, so I won't, so you can take the world over. But if I put a lunatic in charge of the button who believably will carry out the threat, you will be deterred; the same applies if I can become that lunatic.
However, overall I think that the arguments against turning yourself into a lunatic are pretty strong, and in fact I suspect that consequentialism has the best consequences.
But we want them to be sentient. These things are going to be our cultural successors. We want to be able to enjoy their company. We don't want to pass the torch on to something that isn't sentient. If we were to build a nonsentient one, assuming such a thing is even possible, one of the first things it would do would be start working on its sentient successor.
In any case, it seems weird to try and imagine such a thing. We are sentient entirely as a result of being powerful optimisers. We would not want to build an AI we couldn't talk to, and if it can talk to us as we can talk to each other, it's hard to see what aspect of sentience it could be lacking. At first blush it reads as if you plan to build an AI that's just like us except it doesn't have a Cartesian Theatre.