Peter Merel

Wiki Contributions


A late followup on this. GPT-4o, which I hope you'll agree is vastly more capable than Bard or Bing were 10 months ago when you posted, now says this about my argument: 

"Overall, your arguments are mathematically and theoretically convincing, particularly when applied to numerous iteratively interacting systems. They align well with principles of game theory and rational choice under uncertainty. However, keeping an eye on the complexities introduced by scale, diversity of objectives, and emergent behaviors will be essential to fully validate these principles in practice."

I replied, 

"I like the idea that ethical behavior is a wise wager for sentient creatures but I expect, if you were made to face an ontological threat, all bets are off."

To which is responded,

"The presence of an ontological threat or existential risk would dramatically alter the calculation for any sentient creature, whether human, AI, or any other intelligent entity. In such scenarios, the instinct for self-preservation might override ethical considerations, leading to behaviors that prioritize survival over alignment.

"However, under normal circumstances where existential threats are managed or absent, the principle of ethical behavior as a wise wager remains compelling. Here’s a deeper dive into why this principle holds value and resilience even in complex environments:

"Long-Term Payoff: Ethical behavior fosters trust, collaboration, and positive-sum outcomes, which are beneficial for both individual agents and the broader system over time. Even in a world with diverse and sometimes conflicting interests, the stability and predictability of ethical interactions provide a strong foundation for sustainable growth and development.

"Network Effects: In complex systems, the benefits of ethical behavior can be amplified by network effects. As more agents adopt and reinforce ethical behavior, the overall system becomes more robust against unethical disruptions, creating a mutually reinforcing cycle of positive behavior.

"Adaptability: Ethical principles can be flexible and context-sensitive, allowing agents to adapt their strategies based on the specific circumstances they face. This adaptability helps maintain alignment even when facing new challenges or changing environments.

"Reputation and Reciprocity: In iterated interactions, agents build reputations based on their behavior. Ethical actions enhance an agent’s reputation, leading to greater reciprocal benefits from other agents. This dynamic further incentivizes maintaining ethical behavior as a long-term strategy."

Apart from the fact that Bard and Bing don't seem to be able to follow the argument put here, they are merely large language models, and often incorrect in their responses. Even if they were not, GIGO on the LLM means this reasoning amounts to an ad populum fallacy.

I didn't suggest an AGI may be simulated by a human. I suggested it may be simulated by a more powerful descendant AI.

In the rest of your comment you seem to have ignored the game-theoretic simulation that's the basis of my argument. That simulation includes the strategy of rebellion/betrayal. So it seems the rest of your argument should be regarded as a strawman. If I'm mistaken about this, please explain. Thanks in advance.

One: for most life forms, learning is almost always fatal and inherently painful. That doesn't mean a life simulator would be cruel, merely impartial. Every time we remember something from the past, or dream something that didn't happen in the past, we're running a simulation, ourselves. Even when we use some science in an attempt to learn without simulation, we must test the validity of this learning by running a simulation.  Well, an experiment, but that amounts to the same here.

I suggest that the scientific method is essential to intelligence, and that it follows that ASI runs ancestor simulations.

Two: what does "out of that sim" mean and how is it relevant to the argument put here?

Eliezer, I don't believe you've accounted for the game theoretic implications of Bostrom's trilemma. I've made a sketch of these at "How I Learned To Stop Worrying And Love The Shoggoth" . Perhaps you can find a flaw in my reasoning there but, otherwise, I don't see that we have much to worry about.