see also the literature on the problem of evil :P
My favourite theodicy is pre-incarnate consent: before we are born, we consent to our existence on both heaven and earth, where the afterlife was offered to us as compensation for any harms suffered on earth.[1]
How this features in your plan:
Unfortunately, some guys might be upset that we pre-created them for this initial deal, so property X is the property of not being upset by this.
The Pre-Existence Theodicy (Amos Wollen, Feb 21, 2025)
Potentially, we will be creating and destroying many minds and civilizations that matter (like, maybe minimally the ones that didn't have honorable beings).
I'm hopeful we could also select for honourable guys that are happy about their existence and being simulated like this.
For instance, if you're quite sure you've figured out how to make and identify honorable guys, maybe you could try to make many different honorable guys, get bids from all of them, and give the contract to the best bid?
Alternatively: the AI promises that "I will fairly compensate you" where "fair" is to be decided by the AI when it has a better understanding of the situation we were in.
Maybe this explains why you are in an ancestor simulation of the AI safety community.
Here are some specific examples anyway:
I'd add also animal and AI welfare
to not disrupt human life; in particular, it should always remain possible for a community to choose to live some baseline-good life while not interacting with anything downstream of the AI or while only interacting with some chosen subset of things downstream of the AI more generally
one could start by becoming familiar with existing literature on these questions — on the biological, intellectual, and sociocultural evolution/development of trustworthiness, and on the (developmental) psychology of trustworthiness
I've been reading some of the behavioural economics of trust games. One interesting article here is "Bare promises: An experiment" (Charness and Dufwenberg, May 2010) which finds that humans aren't more likely to be nice after making a "bare promise" to be nice (where "bare promise" is like you tick a box saying you'll be nice), but only if they make a promise to the truster in open free-form communication.
Other findings from the literature:
Wanting to be the kind of guy who pays back for good acts (such as creating you and unleashing you) even if done with ability to track whether you are that kind of guy?
The AI should have some decent prob on the simulators having the ability to track whether they are that kind of guy, even if everything they know about the simulators suggests they lack that ability.
p(the creature is honorable enough for this plan) like, idk, i feel like saying
I'd put this much higher. My 90% confidence interval on the proportion of honourable organisms is 10^-3 to 10^-7. This is because many of these smart creatures will have evolved with much greater extrospective access to each other, so they follow open-source-ish game theory rather than the closed-source-ish game theory which humans evolved in. (Open to closed is a bit of a spectrum.)
Why might creatures have greater extrospective access to each other?
deal offered here is pretty fair
Another favourable disanalogy between (aliens, humans) and (humans, AIs): the AIs owe the humans their existence, so they are glad that we [created them and offered us them this deal]. But humans don't owe our existence to the aliens, presumably.
Do adults keep promises to children, if they are otherwise trustworthy? Why, or why not?