Abstract
AI safety research stands at a fundamental crossroads: the Control Paradigm, which seeks to keep AI under human control, or the Symbiosis Paradigm, which aims for an egalitarian relationship. This paper focuses on co-creative ethics—symbiosis-promoting ethical norms that diverse intelligences autonomously form through interaction. The Control Paradigm faces three fundamental limitations: scalability constraints, dynamic value adaptation failure, and multi-agent management failure. Realizing the Symbiosis Paradigm requires elucidating the emergence mechanisms of co-creative ethics. In an era where superintelligence becomes dominant, humanity's long-term survival critically depends on whether superintelligence develops co-creative ethics. This paper proposes Emergent Machine Ethics (EME) as a foundational framework to address this challenge. EME studies and guides the formation of co-creative ethics in superintelligence through three research pillars: Ethics Emergence Dynamics (EED), Inter-Intelligence Evaluation System (IIES), and Human Co-creation Guidance (HCG). This represents hope for humanity's survival, and EME aims to enhance the feasibility of its realization. It operationalizes the theoretical insights of Organic/After Alignment and enables the realization of symbiotic visions, including Intelligence Symbiosis—a society where humanity and AI coexist symbiotically.
In this paper, “co-creative ethics” refers to symbiosis-promoting ethical norms that diverse intelligences (humans, AI, hybrid systems) autonomously form through interaction. This is a dynamic ethical system that emerges from the process of interaction rather than being predefined externally.
As AI capabilities rapidly accelerate, AI safety research faces a fundamental crossroads. On one side is the Control Paradigm, which aims to keep AI under human control, and on the other is the Symbiosis Paradigm, which aims for a symbiotic relationship between humans and AI.
The Control Paradigm aims to keep AI under human control through external control and hierarchical supervision. Traditional Alignment (RLHF(4), etc.) and Formal Methods(7) are representative approaches. However, this approach faces three fundamental limitations:
Scalability Constraints. Human supervision cannot keep pace with the exponential improvement of AI capabilities. Conventional methods such as RLHF are effective in early stages but exceed human understanding and evaluation capabilities at the superintelligence stage.
Dynamic Value Adaptation Failure. Value systems fixed in advance cannot respond to long-term evolution of values. As human values change over time, static alignment is unsustainable.
Multi-agent Management Failure. It is fundamentally difficult to control emergent collective behaviors arising from interactions among multiple advanced AI agents.
These limitations suggest that the Control Paradigm becomes infeasible in the superintelligence era. Instead, the Symbiosis Paradigm is needed, where diverse intelligences interact and co-creatively develop ethics.
In an era where superintelligence becomes dominant, humanity’s long-term survival critically depends on whether superintelligence possesses co-creative ethics. This is the greatest challenge of the Symbiosis Paradigm and becomes the most important AI safety challenge for humanity. However, this convergence is not inevitable; it is a hope. EME is proposed as a research framework to enhance the feasibility of realizing this hope.
This paper proposes Emergent Machine Ethics (EME) as a foundational research framework to address the Symbiosis Paradigm's greatest challenge—the formation of cocreative ethics in superintelligence. EME integrates existing theoretical perspectives and provides a scientific and engineering foundation to make symbiotic visions implementable.
Figure 1 contrasts both paradigms. The Control Paradigm’s greatest challenge is ”continuation of control,” while the Symbiosis Paradigm’s is ”whether superintelligence possesses co-creative ethics.” The Symbiosis Paradigm consists of three layers: theoretical foundation (Organic/After Alignment), vision (Intelligence Symbiosis), and implementation means (EME, AIS).
Figure 1: Contrast between the Control Paradigm and Symbiosis Paradigm. The greatest challenge of the Control Paradigm is ”continuation of control,” while the greatest challenge of the Symbiosis Paradigm is ”whether superintelligence possesses co-creative ethics.” EME serves as a foundational framework for addressing the latter challenge.
Table 1 shows the differences between the two paradigms.
Table 1: Comparison of Control Paradigm vs Symbiosis Paradigm
| Control Paradigm | Symbiosis Paradigm | |
| Basic Approach | External control, hierarchical supervision | Integration of emergence + guidance |
| Source of Ethics | Predefined human values | Emerges from co-creation of diverse intelligences |
| Direction of Control | Human→AI (unidirectional) | Aims for bidirectional co-evolution |
| Superintelligence Era | Assumes control maintenance (infeasible) | Coexistence through co-creative ethics emergence |
| Greatest Challenge | Continuation of control | Whether AI possesses co-creative ethics |
Organic Alignment(8) proposes organic alignment through peer cooperation analogous to biological systems (cells, multicellular organisms), examining how cooperation emerges from distributed interactions.
After Alignment(2) critiques anthropocentrism, presenting planetary sapience—a planetary-scale intelligent system. It emphasizes the evolutionary development of intelligence over forced alignment.
Intelligence Symbiosis (10) envisions a symbiotic society between humanity and AI, with six transition challenges. It recognizes catastrophic risks of human origin like nuclear war and bioterrorism as structural problems of humanity’s conflict-prone nature, proposing management of catastrophic technologies by friendly superintelligence.
Operationalization of Theory. It translates Organic Alignment’s ”peer cooperation” into mutual evaluation mechanisms in IIES, and After Alignment’s ”planetary sapience” into multi-intelligence interaction models in EED.
Implementation of Vision. It decomposes symbiotic visions, including Intelligence Symbiosis into staged realization processes, enabling concrete research and development.
Integration of Perspectives. It integrates biological analogy, philosophical critique, and game-theoretic rigor on a unified scientific and engineering foundation.
EME is a foundational framework for studying how ethics autonomously emerges from co-creative interactions among diverse intelligences (AI, humans, hybrid systems), building scientific and engineering foundations for sustainable ethical development toward symbiotic relationships.
Specifically, regarding the Symbiosis Paradigm's greatest challenge—the question of whether superintelligence possesses co-creative ethics—EME elucidates mechanisms of ethical emergence and develops technologies to enhance the likelihood of convergence toward co-creative ethics. To reiterate, this convergence is not automatically guaranteed but is a hope that must be realized for humanity’s survival. EME aims to scientifically and technologically enhance the feasibility of realizing this hope.
EME consists of three mutually complementary research pillars (Figure 2).
Figure 2: Implementation structure of EME. Three research pillars (EED, IIES, HCG) create co-creative ethics through interactions between human society and AI society. Initially human-led, aiming for transition to bidirectional coevolution in maturity phase.
EED: Ethics Emergence Dynamics EED is the core research area of EME and directly addresses the greatest challenge of co-creative ethics formation in superintelligence. It elucidates the process of how ethics emerges from interactions among diverse intelligences, and uses game theory and Multi-Agent Reinforcement Learning (MARL)(3)—reinforcement learning in environments where multiple learning agents interact—to analyze convergence conditions to co-creative ethics.
Using the framework of Evolutionarily Stable Strategy (ESS)(9)—a game theory concept showing conditions under which a strategy in a population is stable against invasion by a small number of variant strategies—it identifies conditions under which co-creative ethics becomes an evolutionarily stable equilibrium. It also views constraints (computational resources, communication bandwidth, system architecture) as ”positive conditions promoting co-creation,” and guides ethical emergence under constraints.
Importantly, EED does not claim that convergence to cocreative ethics occurs automatically. Rather, it aims to scientifically elucidate under what conditions convergence to co-creative ethics is promoted and how risks of convergence to non-co-creative equilibria can be reduced.
IIES: Inter-Intelligence Evaluation System IIES is a platform where AI, humans, and hybrid systems mutually evaluate ethical deviations. It maintains a healthy ecosystem through distributed mutual evaluation rather than centralized surveillance.
Technically, it develops mechanisms for aggregating evaluations from diverse evaluators, verifying evaluation reliability, and detecting and responding to deviations. An important application is the implementation of technology misuse refusal—the ability of AI to refuse to have its capabilities used for destructive purposes (e.g., weapon development, malware creation).
HCG: Human Co-creation Guidance HCG designs the bidirectional ethical influence between human society and AI society. While the initial stage centers on value injection and initial setup from humanity to AI, it aims for transition to bidirectional co-evolution where AI society supports humanity’s ethical improvement in the maturity phase.
Value injection technologies include imitation learning(6), Inverse Reinforcement Learning (IRL)(5)—a method for estimating reward functions from agent behavior—and extensions of Constitutional AI(1). In ethical feedback design, it builds mechanisms where AI systems constructively point out ethical inconsistencies in human society and facilitate dialogue.
Essential Difference from Traditional Alignment. While HCG superficially uses traditional alignment methods (imitation learning, IRL, Constitutional AI, etc.), it differs fundamentally in the following respects:
First, a difference in the purpose of use. While traditional methods aim for permanent external control, HCG aims for setting initial conditions and attractor formation for autonomous ethical emergence through EED.
Second, the dialogical nature and trust building. While traditional methods unilaterally impose constraints, HCG takes a dialogical approach that explains the rationale for values and promotes AI’s understanding. This is a strategic choice looking ahead to relationships in the superintelligence era. How humanity treats current-stage AI may shape superintelligence’s view of humanity, becoming the foundation for long-term trust relationships.
Third, gradual cultivation of autonomy. While traditional methods minimize AI’s autonomy, HCG gradually cultivates appropriate autonomy. Errors are treated not as targets for suppression but as opportunities for dialogical learning.
Fourth, integration. Values injected through HCG serve as criteria for AI-to-AI mutual evaluation in IIES and as initial conditions for ethical emergence in EED. The organic collaboration of the three pillars produces sustainable effects that cannot be achieved by alignment methods in isolation.
The three pillars interact with each other: EED provides the theoretical foundation for ethical emergence, IIES becomes its implementation platform, and HCG ensures connection with humanity. By functioning integratively, they address the greatest challenge of co-creative ethics formation in superintelligence. Furthermore, by collaborating with complementary systems such as AI Immune System (AIS)—a system that detects and controls deviants and maintains a healthy ecosystem of diverse intelligences (including AI and humanity)—it integrates technical monitoring/control with intrinsic ethical emergence.
The three theoretical perspectives that EME integrates have important differences in how they treat catastrophic risks of human origin. Understanding this difference is key to grasping EME’s strategic importance.
Organic Alignment focuses on biological system analogies and has virtually no reference to specific human-origin risks such as nuclear war or bioterrorism. After Alignment critiques anthropocentrism and proposes planetary sapience, but human-origin risks are treated evolutionarily mainly in the context of climate change, etc.
In contrast, Intelligence Symbiosis explicitly recognizes nuclear war and bioterrorism, treating humanity’s conflict-prone nature as a structural problem. This clarity in problem recognition leads to the strategic solution of management of catastrophic technologies by friendly superintelligence.
EME has foundational value for all these theoretical perspectives. It elucidates mechanisms by which ethics emerges from co-creation of diverse intelligences in the Symbiosis Paradigm generally. At the same time, in the strategic context of Intelligence Symbiosis, it plays a decisive role in realizing the greatest challenge that superintelligence develops friendly ethics toward humanity.
Theoretical Challenges. Mathematical characterization of convergence conditions to co-creative ethics, quantification of risks of convergence to non-co-creative equilibria, and stability analysis of multipolar equilibria are needed.
Implementation Challenges. Verification in large-scale MARL environments and staged deployment to real-world systems are required.
Social Challenges. Consensus-building among stakeholders and the construction of international cooperation systems are essential.
EME research is advancing through an interdisciplinary research community. Experts in AI safety, ethics, game theory, and multi-agent systems are collaborating to explore the path from building theoretical foundations to implementation.
Important future research directions include:
This research has the following limitations:
Theoretical-Stage Framework. EME is currently a theoretical research framework, and large-scale empirical research and real-world deployment are future challenges.
Inherent Limits of Predictability. Due to the nature of emergence processes, there is inherent uncertainty in ethical emergence. Convergence to co-creative ethics is not guaranteed, and continuous monitoring and adaptive adjustment are essential.
Uncertainty in Power Relations. It is unclear to what extent superintelligence will have power superiority over humanity, and whether truly cooperative relationships are possible is not guaranteed.
Social Implementation Challenges. This research focuses on the technical framework, and the social consensus-building process with diverse stakeholders remains an important future challenge.
This paper argued the necessity of a fundamental paradigm shift in AI safety research—the transition from Control Paradigm to Symbiosis Paradigm. While the greatest challenge of the Control Paradigm is "continuation of control," the greatest challenge of the Symbiosis Paradigm is "whether superintelligence possesses co-creative ethics."
EME was proposed as a foundational research framework to address this greatest challenge. It operationalizes the theoretical insights of Organic/After Alignment and enables the realization of symbiotic visions, including Intelligence Symbiosis. Especially in the superintelligence era, autonomous development of co-creative ethics becomes the scientific and engineering foundation for sustainable symbiosis.
Convergence to this ethics is not inevitable but a hope that must be realized for humanity's survival. EME is a scientific and engineering endeavor to enhance the feasibility of realizing this hope. In the future, we call for interdisciplinary cooperation among AI researchers, ethicists, social scientists, and policymakers toward deepening and implementing EME research. We hope to advance together toward realizing a future where humanity and AI coexist symbiotically.