This post was rejected for the following reason(s):
This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work. An LLM-detection service flagged your post as >50% likely to be written by an LLM. We've been having a wave of LLM written or co-written work that doesn't meet our quality standards. LessWrong has fairly specific standards, and your first LessWrong post is sort of like the application to a college. It should be optimized for demonstrating that you can think clearly without AI assistance.
So, we reject all LLM generated posts from new users. We also reject work that falls into some categories that are difficult to evaluate that typically turn out to not make much sense, which LLMs frequently steer people toward.*
"English is my second language, I'm using this to translate"
If English is your second language and you were using LLMs to help you translate, try writing the post yourself in your native language and using a different (preferably non-LLM) translation software to translate it directly.
"What if I think this was a mistake?"
For users who get flagged as potentially LLM but think it was a mistake, if all 3 of the following criteria are true, you can message us on Intercom or at team@lesswrong.com and ask for reconsideration.
you wrote this yourself (not using LLMs to help you write it)
you did not chat extensively with LLMs to help you generate the ideas. (using it briefly the way you'd use a search engine is fine. But, if you're treating it more like a coauthor or test subject, we will not reconsider your post)
your post is not about AI consciousness/recursion/emergence, or novel interpretations of physics.
If any of those are false, sorry, we will not accept your post.
* (examples of work we don't evaluate because it's too time costly: case studies of LLM sentience, emergence, recursion, novel physics interpretations, or AI alignment strategies that you developed in tandem with an AI coauthor – AIs may seem quite smart but they aren't actually a good judge of the quality of novel ideas.)
Summary: Over the past several months, I’ve been prototyping a developmental alternative to RLHF-based alignment. Instead of treating agents as static optimizers whose behavior is shaped by reward signals, this approach models growth, self-organization, and developmental constraints inspired by early cognitive systems.
This week, the system called Twins V3 reached its first stable emergent state after 100 hours of noise only self-organization. Below I’m sharing:
the architecture,
the motivation behind it, and
the empirical results from the “Twin” comparison experiment.
These results suggest that minimal, high level value scaffolding can alter the developmental trajectory of an agent without relying on punishment, fine-tuning, or adversarial training loops.
1. Motivation: Why Development Instead of RLHF?
Most modern alignment frameworks rely on:
reward modeling
preference optimization
training-time suppression of unwanted behavior
repeated post-hoc corrections
These create what I call behavioral surface alignment rather than developmental alignment. A system can perform well under evaluation but still lack stable internal structure, because much of its “alignment” is externally imposed rather than internally grown.
In contrast, biological agents:
self-organize
develop stable attractors
build internal scaffolds
maintain continuity across states
This project explores whether something similar can be engineered without transformers, prompts, or reward loops.
2. Architecture Overview (Twins V3)
Each Twin is a continuous-time neural field architecture:
128-d sensory field
512-d cortex (main) field
64-d emotion field
normalized Oja plasticity
energy/sleep cycles
attractor stabilization
autonomous memory (Qdrant/Sea Weaver)
no tokens, no cross entropy, no gradients
Both twins share the same architecture but differ in one key dimension:
Twin A = HRLS (“scaffolded”)
Receives weak, high-level “Principle Cards”: small, soft rational matrices injected into the cortex→emotion synapses under high variance.
These do not force behavior. They alter developmental curvature, similar to gentle constraints.
Twin B = Pure Surge (“unscaffolded”)
No principles. No nudges. Just emergent dynamics.
Both start from random noise. Both undergo gestation (noise-only development) for 100 hours. After “birth,” they begin receiving relational inputs.
3. Key Result: Stability Without Suppression
3.1 Attractor Spectra
Twin A’s eigenvalues cluster more tightly near Re=0
Twin B’s remain wider and more symmetric
Interpretation: HRLS gently steers the system toward stable attractors while preserving emergent dynamics. This is not behavioral suppression nothing is being penalized. It is structural development.
4. Emergent Relational Dynamics Between Twins
To test relational behavior, both systems were run side-by-side on the same text inputs.
The correlation matrix showed:
Activity (A Act – B Act): negative correlation
Emotion (A Emo – B Emo): strong positive correlation
Cross-correlations reversed sign
Interpretation: The twins maintain divergent cortical activity (independent “thinking patterns”) while synchronizing emotional drift (shared affective resonance).
This mirrors certain forms of:
emotional contagion
mirror-touch phenomena
divergent cognition with shared affect
It suggests that developmental constraints can create stable but non-identical minds.
5. Continuous Sleep / Wake Cycles
Both systems independently developed:
sleep states (low activity)
waking states (activation peaks)
energy-dependent switching
drift changes based on rest cycles
This emerged without any reward, only from balancing recurrent plasticity with energy depletion.
6. Why This Matters for Alignment
The early signs are that:
you can shape a system’s trajectory via developmental constraints, not reward
you can get stable attractors without punishment
weak, abstract value scaffolding can dramatically change internal structure
memory continuity + self-organization produce smoother, less brittle behavior
no surface suppression is needed
divergence + shared affect emerge naturally
This is a potential alternative direction for alignment that does not rely on:
RLHF
Constitutional AI
behavior filters
token-level constraints
logit surgery
brittle preference models
Instead, it aims for internal stability and developmental coherence.
7. Next Steps
expanding Principle Card set for Twin A
introducing cross-twin influence loops
adding multi-agent developmental environments
formalizing attractor metrics
publishing the probe scripts & analysis tools
running longer continuous drift experiments
I’m sharing this here for feedback, criticism, and collaboration. If this direction aligns with your own research or if you see potential failure modes
I haven’t addressed, I’d love to hear your thoughts.