Difficult to evaluate, with potential yellow flags.
Insufficient Quality for AI Content.
Read full explanation
I have been trying to build a ended evolution system by myself without a lab or a supervisor. This started with a question that was not about AI safety I just wanted to know what happens if you remove the objective entirely. I mean, no reward function, no fitness score no "maximize X." agents surviving inside environments with physical constraints like a world with laws of physics but no judge.
I have been working on this for eight months now. Two papers got into GECCO 2026. This specific thing I noticed recently I do not have a clean explanation for it yet.
The short version is that in co-evolving environments agents grew to 467 internal nodes while keeping the same external behavior as 52-node controls. The sham controls stayed small. I do not fully understand why this happened.
I tested two conditions to see what was going on. The first one was fixed physics, where the environment rules were the same across all generations. The second one was co-evolving physics, where the environment itself mutates alongside the agents, terrain, physical structure all of it. Everything else was same initialization, same mutation settings, same seed, same survival constraints.
I ran sham controls too versions where the co-evolution infrastructure was technically active but environments were prevented from changing. I did this so I could separate "does the infrastructure cause this" from "does environmental change cause this." After 2000 generations I looked at the external behavior of both conditions. They had the strategy, secrete, wait, move. Then I looked at the networks.
The fixed environment had 52 nodes. The co-evolving environment had 467 nodes. I genuinely thought my code had a bug. I re-ran the seed, same result. The sham controls stayed near the sizes. So whatever is happening is specifically tied to co-evolution not just the infrastructure existing.
Here is what I am having trouble with. A behavioral evaluation of these two populations would see nothing. They have the strategy same observable outputs, nothing to flag.. Internally one of them is 38 times more complex than the other. I know what this resembles in terms but my agents have no goals they are not strategically hiding anything there is no training signal to game.
The structural divergence did not come from any of the mechanisms alignment researchers worry about in neural network training. These are not agents they have no intent.. The structural signature, massive internal complexity that never surfaced in behavior is there anyway.
Maybe the reason lies here? An expert in this field saw a version of this and immediately flagged bloat neutral structure accumulating because mutation rates are too permissive. He is probably right that this is the likely explanation.. Something in the co-evolving condition seems to be keeping those nodes alive anyway.
Maybe the environments are functionally distinct in ways my behavioral metrics do not capture. Maybe "secrete, wait, move" is actually solving ecological problems in different environments and I am measuring at the wrong level of abstraction. Maybe my behavioral equivalence claim is too crude.
Maybe this is a property of CPPN encoding specifically CPPNs can accumulate latent representational structure that direct encodings would prune and maybe co-evolving physics creates enough landscape instability that "neutral" structure is not actually neutral it is being maintained by environmental dynamics rather than functional utility.
I am genuinely uncertain which of these is true...
I am running tests now to see if the environments are ecologically distinct if behavioral equivalence survives finer-grained quantitative analysis and if the large networks are causally necessary or just historically accumulated scaffolding.
The thing that keeps nagging at me... is that NEATs speciation should eliminate structure. Structural innovations only stay if they are an advantage.. Co-evolving physics is sustaining 415 nodes that produce no observable behavioral advantage. So. They are producing an advantage I am not measuring or co-evolutionary pressure can sustain structural complexity that fixed-environment selection would prune.
If co-evolutionary pressure can sustain complexity then it might be a natural mechanism for generating and maintaining internal complexity before any behavioral expression. Not by design not through any process just as an emergent property of environments that keep changing alongside the systems inside them.
I do not know if this is a known result in ended evolution. That is partly why I am posting this. I would like help with a things. Are the sham controls sufficient to establish that environmental co-evolution is the causal variable here? Is there a confound I am not seeing?
For people who work on NEAT/CPPNs specifically have you seen this kind of divergence without behavioral divergence before? Is there an existing term for it? For people on the side does the mechanism difference matter? If the structural precondition can form without any of the training dynamics does that change anything about how you would think about detecting or intervening on it?
If this is just ordinary bloat what is the cleanest experiment to prove that definitively so I can close this question and move on?
My code and reproducibility package are available at github.com/gearupsmile/genesis-emergence.
My preprints are available at https://www.researchgate.net/profile/Anushka-Sharma-77.
I am not claiming this is something I might be rediscovering something the EC community already knows.. I have looked and I cannot find it and the sham controls keep making it feel non-trivial to me. I would rather ask and be wrong, than sit on it alone for another month.
If this is bloat why do the sham controls stay flat?
I have been trying to build a ended evolution system by myself without a lab or a supervisor. This started with a question that was not about AI safety I just wanted to know what happens if you remove the objective entirely. I mean, no reward function, no fitness score no "maximize X." agents surviving inside environments with physical constraints like a world with laws of physics but no judge.
I have been working on this for eight months now. Two papers got into GECCO 2026. This specific thing I noticed recently I do not have a clean explanation for it yet.
The short version is that in co-evolving environments agents grew to 467 internal nodes while keeping the same external behavior as 52-node controls. The sham controls stayed small. I do not fully understand why this happened.
I tested two conditions to see what was going on. The first one was fixed physics, where the environment rules were the same across all generations. The second one was co-evolving physics, where the environment itself mutates alongside the agents, terrain, physical structure all of it. Everything else was same initialization, same mutation settings, same seed, same survival constraints.
I ran sham controls too versions where the co-evolution infrastructure was technically active but environments were prevented from changing. I did this so I could separate "does the infrastructure cause this" from "does environmental change cause this." After 2000 generations I looked at the external behavior of both conditions. They had the strategy, secrete, wait, move. Then I looked at the networks.
The fixed environment had 52 nodes. The co-evolving environment had 467 nodes. I genuinely thought my code had a bug. I re-ran the seed, same result. The sham controls stayed near the sizes. So whatever is happening is specifically tied to co-evolution not just the infrastructure existing.
Here is what I am having trouble with. A behavioral evaluation of these two populations would see nothing. They have the strategy same observable outputs, nothing to flag.. Internally one of them is 38 times more complex than the other. I know what this resembles in terms but my agents have no goals they are not strategically hiding anything there is no training signal to game.
The structural divergence did not come from any of the mechanisms alignment researchers worry about in neural network training. These are not agents they have no intent.. The structural signature, massive internal complexity that never surfaced in behavior is there anyway.
Maybe the reason lies here? An expert in this field saw a version of this and immediately flagged bloat neutral structure accumulating because mutation rates are too permissive. He is probably right that this is the likely explanation.. Something in the co-evolving condition seems to be keeping those nodes alive anyway.
Maybe the environments are functionally distinct in ways my behavioral metrics do not capture. Maybe "secrete, wait, move" is actually solving ecological problems in different environments and I am measuring at the wrong level of abstraction. Maybe my behavioral equivalence claim is too crude.
Maybe this is a property of CPPN encoding specifically CPPNs can accumulate latent representational structure that direct encodings would prune and maybe co-evolving physics creates enough landscape instability that "neutral" structure is not actually neutral it is being maintained by environmental dynamics rather than functional utility.
I am genuinely uncertain which of these is true...
I am running tests now to see if the environments are ecologically distinct if behavioral equivalence survives finer-grained quantitative analysis and if the large networks are causally necessary or just historically accumulated scaffolding.
The thing that keeps nagging at me... is that NEATs speciation should eliminate structure. Structural innovations only stay if they are an advantage.. Co-evolving physics is sustaining 415 nodes that produce no observable behavioral advantage. So. They are producing an advantage I am not measuring or co-evolutionary pressure can sustain structural complexity that fixed-environment selection would prune.
If co-evolutionary pressure can sustain complexity then it might be a natural mechanism for generating and maintaining internal complexity before any behavioral expression. Not by design not through any process just as an emergent property of environments that keep changing alongside the systems inside them.
I do not know if this is a known result in ended evolution. That is partly why I am posting this. I would like help with a things. Are the sham controls sufficient to establish that environmental co-evolution is the causal variable here? Is there a confound I am not seeing?
For people who work on NEAT/CPPNs specifically have you seen this kind of divergence without behavioral divergence before? Is there an existing term for it? For people on the side does the mechanism difference matter? If the structural precondition can form without any of the training dynamics does that change anything about how you would think about detecting or intervening on it?
If this is just ordinary bloat what is the cleanest experiment to prove that definitively so I can close this question and move on?
My code and reproducibility package are available at github.com/gearupsmile/genesis-emergence.
My preprints are available at https://www.researchgate.net/profile/Anushka-Sharma-77.
I am not claiming this is something I might be rediscovering something the EC community already knows.. I have looked and I cannot find it and the sham controls keep making it feel non-trivial to me. I would rather ask and be wrong, than sit on it alone for another month.
If this is bloat why do the sham controls stay flat?