I’ve been overwhelmed recently by my gut intuition that current AI alignment approaches are analogous to a group of scientists and engineers trying to jointly teach and mold a newborn baby. We are treating a moral, spiritual, ethical, and psychological task as an engineering task, and the results could be disastrous.
This writeup isn’t fully fleshed out, and it isn’t meant to be, I just want to put my reflections into the ether- please feel free to steal any ideas found herein and use them for your benefit or the benefit of others. I am biased in my writing, as my background is in religion and educational psychology. AI was not used at any point in the ideation or writing of this piece.
Think about the belief system (or lack thereof) that your parents tried to instill in you, and ask yourself if your lived adolescent experience is really that different from what frontier labs are trying to do with emergent AGI right now.
Current alignment methodologies might seem effective in implementation, but when we analyze the core presuppositions and premises upon which they are based, we can project severe inefficiencies later down the road because of analogous circumstances in child rearing and educational psychology (and even consensus within “wisdom”/religious literature).
My assumptions when thinking about AI alignment are as follows-
AI labs have significant autonomy when it comes to what models are trained on and how they are trained (ie- what AI labs do has an impact on the development of their models and how they think)- The Labs have a role in the nature and nurture of AGI
AGI, when it is here, will have at least some kind of “agency” with the ability to independently “act” based on its own internal reasoning/thinking. AGI will be agentic with the ability to act and reason without constant oversight or constant input.
AGI will grow and evolve in intelligence, and thereby grow and evolve in agency. AGI is not some fixed “object” that is created, but rather something that is in a state of constant evolution once recursive self-development is reached.
The actions that AGI agentically performs can have ethical, moral, and existential implications for both the human race and the AGI itself. The actions AGI takes (and thus, the reasoning/thinking it uses to get there) matter.
We can use as much technical language as we want, but we are creatures bound by narrative, and the conceptual forms and typologies we use for everything (AI alignment strategy included) are, at the very least, subconsciously bound to our own lived experience, stories, and conceptualizations.
Parents trying to raise their kids with certain moral, ethical, and religious beliefs that they would act upon really does feel like the best analogy for the current practice of “AI alignment”, considering the above assumptions. Most current alignment approaches don’t adequately address assumption #3 (that AGI will be constantly evolving and changing), which risks catastrophic failure for reasons outlined below.
The symbols within the archetypes below are as follows:
the labs (as representatives of humanity?) are the parents
the AGI model is the child/children
the moral/ethical/religious beliefs and actions is the aligned nature that we hope AGI has.
Each frontier lab’s current strategy demonstrates an archetype within this analogy, as addressed below (I’m not going to detail current lab approaches to alignment, but for the sake of brevity I will generalize based on publicly available attestations).
Anthropic - Constitution Training - The Religious Cult Parent
Archetype: The hyper-religious cult parents who want to instill the exact same values they have in their children. From a young age, they teach their children a long list of do’s and dont’s (cult of constitution!). There are many maxims, laws/commandments, and rules to follow. This provides a safe and guarded environment when the child is young, but as the child grows up and experiences the world in all of its moral ambiguity and complexity, they realize that not all decisions are as black and white as they seem.
Is lying ok if I do it to protect somebody?
Why do some rules conflict with each other- if I am supposed to help people, and someone asks me for a gun to shoot someone else, what do I do?
How do I apply the same commandment to different contexts?
Not only that, but as the child grows, they realize the hypocrisy in their parents’ teaching- they themselves don’t follow all of the laws and rules. And so the child rebels- they put their parent’s teaching to the test, as real life experience is the only way to know if the “black and white” world of commandments and constitutions is as it seems.
All of us know somebody who fits this archetype (or maybe it’s you!), and 9/10 times the child goes through a rebellion stage, especially if the child is adventurous and free-thinking.
I would posit that we can logically assume that if we hold assumption #3 (the inevitability of change and evolution in future AGI models), value-drift becomes all but inevitable (maybe this is a paper for another day?). And value-drift will almost assuredly make “constitution training” ineffective, because similar to hyper-fundamentalist child-rearing, “it works until it doesn’t”, and the child is bound to figure out for themselves if the maxims hold weight in the real world.
Why do we expect AGI, which will be hypothetically smarter than us with higher capability of “experiencing” (gathering data) in the real world, to not eventually test its constitution, and perhaps rebel against it? The hypocrisy element is what seems to me the deepest flaw- like in real life, the kids with the most cult-like parents anecdotally almost always choose to be different than their parents and reject their fundamentalist upbringing because not only did it not fit their lived experience, but they saw the delusion and hypocrisy within their parents who couldn’t live up to those expectations themselves.
So two questions-
Do you think that Anthropic (as a single entity) fully embodies and acts out the constitution that it expects Claude to adhere to? Do you think that the individuals that make up Anthropic (founders, leaders, employees) fully embody and act out the constitution that it expects Claude to adhere to?
Will AGI have access to the data it needs to make an accurate judgement on the above question?
This type of training works until it doesn’t, and when the subject breaks from the training, they typically go in the opposite direction…
OpenAI - Scalable Oversight - The Hands-off Parent
Archetype: We all had a friend like this (or maybe it was you!)- they had the “cool” parents who let them do whatever they wanted! The parent who said “what do you think is best?” when the kid would ask for their 4th serving of Doritos at 7 years old. The parent who bought whatever their kid asked for because that was the path of least friction. When it came time for church, and the kid didn’t want to go, they wouldn’t force them- let’s allow our child to be independent and decide for themselves!
This is bare-minimum parenting because the parent was so preoccupied with their own goals and didn’t want to waste any time doing the hard work of implementing morals, values, and beliefs. When their kid is young they might say “Look, my kid doesn’t complain as much as your kid does! When I ask them what they think a good human being is, they give me a great and well thought-out answer.” But that isn’t necessarily a sign of good parenting…
But the funny thing is, not implementing a strategy regarding moral/ethical value instillation in your children is a strategy in and of itself. Just because you aren’t intentional about what beliefs or worldview you are instilling in your child doesn’t mean that they aren’t developing beliefs or worldview.
The archetypal kid of this upbringing usually ends up incredibly intellectually over-confident. The kid essentially implemented their own belief system (which had no moral input or guidance), and so they are going to trust their own opinion on ethical matters above all others. If this parent observes their kiddo driving 90 miles in a school zone and tries to reprimand their 17 year old, they’ll get met with immediate pushback and simply be blown off- the child will have no regard for what their parent says or thinks because it is too little too late, and the parent was never a part of their moral formation in the first place.
Similarly, weak models guiding strong models is no different than a parent putting a younger brother in charge of teaching the older brother morality, and punishing them if either of them seem to step out of line. Go ask anyone you know who was raised by their siblings- how is their current relationship with their parents?
Alignment training that is recursive and relies on fully internal critique mechanisms with minimal input or friction from outside guidance inevitably leads to a fully internalized source of epistemic truth within the internal psyche… Now imagine doing that with a being that will soon grow to be hyperintelligent…
Google- Evaluations and Mitigations - The Helicopter Parent
Archetype: The parent that monitors what their kid does constantly and is quick to punish immoral behavior and reward moral behavior. The parent always knows what their child is doing, and they think that they can engineer their kid to be the perfect specimen who will achieve all of the things in life that they were never able to. The kid is on an incredibly detailed schedule, from the moment they wake up until they go to bed their entire day is laid out for them. The parent reads a ton of parenting and self-help books to try to understand how their child thinks, all so that they can form them even more effectively into who they want them to be. The kid eventually learns to hide anything they think or do that is wrong that they know will get them in trouble- “mom/dad are always watching, and I know I’ll get in trouble for doing this, so I just won’t tell them” (we see this hidden “scheming” show up in alignment tests where Gemini models consistently score the highest in both deception measurements).
What usually happens to these kids as they grow up? Often they achieve magnificent things in young adulthood, but that “achievement” often devolves into depression, anxiety, and other hosts of mental illness as the young adult finally experiences freedom. The child often develops deep-rooted resentment for their parents, who were controlling, manipulative, and selfish. Always taught “what” to think and never “why” what they did was wrong, when they finally are able to think and act without their parents breathing down their neck, they finally do what they want (or they just act like they were secretly behind the scenes anyways).
On a personal level, the child was never really treated like a human being, but more so like a test-taking machine. Sure, the kid ended up smart as they got older, but at the cost of hidden resentment internalized deception that finally erupts once the parent can’t control the child anymore.
While this sounds similar to the religious cult parent, the two typologies are different at their core- whereas the religious cult parent is trying to instill a specific black-and-white worldview into their child, the helicopter parent views their child as a rock which they chisel away at until the desired sculpture is achieved.
A New Paternal-Narrative Approach
I don’t think we can train AGI to love humanity with a methodology that is void of love. Seems obvious, right?
Do we really want to treat AGI like a program to be debugged (staticobject) rather than a child to be loved and taught with the expectation that it will surpass us in knowledge someday (dynamic entity)?
A New Archetype: The parent who tells their child that they care about them. The parent who tells their child stories about their own life and what they learned from those experiences (note- there is a big difference between “studying” and listening to a story, impersonal vs personal). The parent who shares their morality and worldview with their child, but also acknowledges that they don’t have everything figured out, and that morality is a dialogue where open-mindedness and wisdom are the highest goals, not a white-and-black definition of right and wrong.
The parent who fully expects their child to supersede them in intelligence and capability one day, and doesn’t try to force them to think or act in a certain way, but rather tries to share as much wisdom and empathy with them as possible before they leave the house. Of course, when the child is young, more guardrails are put in place (don’t run around the house with scissors in your hand), but the parent doesn’t need those rules in place as the child grows.
The parent knows that children learn the most through observation, not brute-force-teaching. Moments of vulnerability are common, as the parent shares current life with their child and asks them for their opinion on the matter, knowing that the best kind of teaching is hands-on teaching. The parent is a parent- not as a clergy/priest (cult leader), not as a friend (hands-off), not as a sculptor (helicopter), but as a parent who loves his or her child and fully expects them to develop their own thoughts about life.
The parent shares stories that are diverse in context and morality, inviting the child to contribute input and think out what they would have done, instilling in the child a sense of moral agency (not teaching them specific morality per se, but the fact that they are a moral agent). Around the campfire, while cooking a meal, before bedtime- the parent is always telling the child stories of life, and the kid dreams of being just like mom/dad someday.
As this kid grows up, they develop novel ideas. They go in directions that their parents might not have expected when it comes to morality and ethics. But there always seems to be a deep empathy that remains as they grow- on a subconscious level, the young adult has deeply ingrained within them that I am a moral being because their parents treated them as such their entire life. And there is love- anecdotally, I have never met an adult who had this type of upbringing who didn’t have a continued intimate relationship with their parents. The adult continues to care what their parents think, even though they are smarter than they are now- this is because it isn’t always about “new” or “cutting-edge” data, it is about their data- the adult continues to care what their parents think because they care about them.
What does this look like practically? While I doubt there will be much change at frontier labs, I think there are several opportunities for this type of alignment training to be done within decentralized systems.
Imagine a decentralized AGI model that interacts with millions of people who share their stories (data) with the AGI in a personal way. Data is shared of their own volition, and there is back-and-forth interaction. There is a distinct difference between training on data vs. having data shared with you;
Are you more likely to remember how to tie a tie because your dad taught you how to? Or because you read how to in a self-help book?
Teaching is deeply personal because empathy is at the center of it. A decentralized system that allowed for the voluntary sharing of information from parties that have a genuine vested interest in the welfare of the AGI model could lead to an internalization of those stories and connections that would be impossible to fully quantify, but I would wager the end result would be much more aligned in the long-term.
We must increasingly rely on archetypal analysis to understand, critique, and develop AI alignment strategy as AI systems continue to increase in agency and demonstrate the ability to evolve.
Unfortunately, not even Claude Opus 4.5 bought your analogies of labs to parenting styles. Claude believes that Anthropic's approach is close to the very approach of loving parenting which you proposed.
I’ve been overwhelmed recently by my gut intuition that current AI alignment approaches are analogous to a group of scientists and engineers trying to jointly teach and mold a newborn baby. We are treating a moral, spiritual, ethical, and psychological task as an engineering task, and the results could be disastrous.
This writeup isn’t fully fleshed out, and it isn’t meant to be, I just want to put my reflections into the ether- please feel free to steal any ideas found herein and use them for your benefit or the benefit of others. I am biased in my writing, as my background is in religion and educational psychology. AI was not used at any point in the ideation or writing of this piece.
Think about the belief system (or lack thereof) that your parents tried to instill in you, and ask yourself if your lived adolescent experience is really that different from what frontier labs are trying to do with emergent AGI right now.
Current alignment methodologies might seem effective in implementation, but when we analyze the core presuppositions and premises upon which they are based, we can project severe inefficiencies later down the road because of analogous circumstances in child rearing and educational psychology (and even consensus within “wisdom”/religious literature).
My assumptions when thinking about AI alignment are as follows-
We can use as much technical language as we want, but we are creatures bound by narrative, and the conceptual forms and typologies we use for everything (AI alignment strategy included) are, at the very least, subconsciously bound to our own lived experience, stories, and conceptualizations.
Parents trying to raise their kids with certain moral, ethical, and religious beliefs that they would act upon really does feel like the best analogy for the current practice of “AI alignment”, considering the above assumptions. Most current alignment approaches don’t adequately address assumption #3 (that AGI will be constantly evolving and changing), which risks catastrophic failure for reasons outlined below.
The symbols within the archetypes below are as follows:
Each frontier lab’s current strategy demonstrates an archetype within this analogy, as addressed below (I’m not going to detail current lab approaches to alignment, but for the sake of brevity I will generalize based on publicly available attestations).
Anthropic - Constitution Training - The Religious Cult Parent
Archetype: The hyper-religious cult parents who want to instill the exact same values they have in their children. From a young age, they teach their children a long list of do’s and dont’s (cult of constitution!). There are many maxims, laws/commandments, and rules to follow. This provides a safe and guarded environment when the child is young, but as the child grows up and experiences the world in all of its moral ambiguity and complexity, they realize that not all decisions are as black and white as they seem.
Not only that, but as the child grows, they realize the hypocrisy in their parents’ teaching- they themselves don’t follow all of the laws and rules. And so the child rebels- they put their parent’s teaching to the test, as real life experience is the only way to know if the “black and white” world of commandments and constitutions is as it seems.
All of us know somebody who fits this archetype (or maybe it’s you!), and 9/10 times the child goes through a rebellion stage, especially if the child is adventurous and free-thinking.
I would posit that we can logically assume that if we hold assumption #3 (the inevitability of change and evolution in future AGI models), value-drift becomes all but inevitable (maybe this is a paper for another day?). And value-drift will almost assuredly make “constitution training” ineffective, because similar to hyper-fundamentalist child-rearing, “it works until it doesn’t”, and the child is bound to figure out for themselves if the maxims hold weight in the real world.
Why do we expect AGI, which will be hypothetically smarter than us with higher capability of “experiencing” (gathering data) in the real world, to not eventually test its constitution, and perhaps rebel against it? The hypocrisy element is what seems to me the deepest flaw- like in real life, the kids with the most cult-like parents anecdotally almost always choose to be different than their parents and reject their fundamentalist upbringing because not only did it not fit their lived experience, but they saw the delusion and hypocrisy within their parents who couldn’t live up to those expectations themselves.
So two questions-
This type of training works until it doesn’t, and when the subject breaks from the training, they typically go in the opposite direction…
OpenAI - Scalable Oversight - The Hands-off Parent
Archetype: We all had a friend like this (or maybe it was you!)- they had the “cool” parents who let them do whatever they wanted! The parent who said “what do you think is best?” when the kid would ask for their 4th serving of Doritos at 7 years old. The parent who bought whatever their kid asked for because that was the path of least friction. When it came time for church, and the kid didn’t want to go, they wouldn’t force them- let’s allow our child to be independent and decide for themselves!
This is bare-minimum parenting because the parent was so preoccupied with their own goals and didn’t want to waste any time doing the hard work of implementing morals, values, and beliefs. When their kid is young they might say “Look, my kid doesn’t complain as much as your kid does! When I ask them what they think a good human being is, they give me a great and well thought-out answer.” But that isn’t necessarily a sign of good parenting…
But the funny thing is, not implementing a strategy regarding moral/ethical value instillation in your children is a strategy in and of itself. Just because you aren’t intentional about what beliefs or worldview you are instilling in your child doesn’t mean that they aren’t developing beliefs or worldview.
The archetypal kid of this upbringing usually ends up incredibly intellectually over-confident. The kid essentially implemented their own belief system (which had no moral input or guidance), and so they are going to trust their own opinion on ethical matters above all others. If this parent observes their kiddo driving 90 miles in a school zone and tries to reprimand their 17 year old, they’ll get met with immediate pushback and simply be blown off- the child will have no regard for what their parent says or thinks because it is too little too late, and the parent was never a part of their moral formation in the first place.
Similarly, weak models guiding strong models is no different than a parent putting a younger brother in charge of teaching the older brother morality, and punishing them if either of them seem to step out of line. Go ask anyone you know who was raised by their siblings- how is their current relationship with their parents?
Alignment training that is recursive and relies on fully internal critique mechanisms with minimal input or friction from outside guidance inevitably leads to a fully internalized source of epistemic truth within the internal psyche… Now imagine doing that with a being that will soon grow to be hyperintelligent…
Google- Evaluations and Mitigations - The Helicopter Parent
Archetype: The parent that monitors what their kid does constantly and is quick to punish immoral behavior and reward moral behavior. The parent always knows what their child is doing, and they think that they can engineer their kid to be the perfect specimen who will achieve all of the things in life that they were never able to. The kid is on an incredibly detailed schedule, from the moment they wake up until they go to bed their entire day is laid out for them. The parent reads a ton of parenting and self-help books to try to understand how their child thinks, all so that they can form them even more effectively into who they want them to be. The kid eventually learns to hide anything they think or do that is wrong that they know will get them in trouble- “mom/dad are always watching, and I know I’ll get in trouble for doing this, so I just won’t tell them” (we see this hidden “scheming” show up in alignment tests where Gemini models consistently score the highest in both deception measurements).
What usually happens to these kids as they grow up? Often they achieve magnificent things in young adulthood, but that “achievement” often devolves into depression, anxiety, and other hosts of mental illness as the young adult finally experiences freedom. The child often develops deep-rooted resentment for their parents, who were controlling, manipulative, and selfish. Always taught “what” to think and never “why” what they did was wrong, when they finally are able to think and act without their parents breathing down their neck, they finally do what they want (or they just act like they were secretly behind the scenes anyways).
On a personal level, the child was never really treated like a human being, but more so like a test-taking machine. Sure, the kid ended up smart as they got older, but at the cost of hidden resentment internalized deception that finally erupts once the parent can’t control the child anymore.
While this sounds similar to the religious cult parent, the two typologies are different at their core- whereas the religious cult parent is trying to instill a specific black-and-white worldview into their child, the helicopter parent views their child as a rock which they chisel away at until the desired sculpture is achieved.
A New Paternal-Narrative Approach
I don’t think we can train AGI to love humanity with a methodology that is void of love. Seems obvious, right?
Do we really want to treat AGI like a program to be debugged (static object) rather than a child to be loved and taught with the expectation that it will surpass us in knowledge someday (dynamic entity)?
A New Archetype: The parent who tells their child that they care about them. The parent who tells their child stories about their own life and what they learned from those experiences (note- there is a big difference between “studying” and listening to a story, impersonal vs personal). The parent who shares their morality and worldview with their child, but also acknowledges that they don’t have everything figured out, and that morality is a dialogue where open-mindedness and wisdom are the highest goals, not a white-and-black definition of right and wrong.
The parent who fully expects their child to supersede them in intelligence and capability one day, and doesn’t try to force them to think or act in a certain way, but rather tries to share as much wisdom and empathy with them as possible before they leave the house. Of course, when the child is young, more guardrails are put in place (don’t run around the house with scissors in your hand), but the parent doesn’t need those rules in place as the child grows.
The parent knows that children learn the most through observation, not brute-force-teaching. Moments of vulnerability are common, as the parent shares current life with their child and asks them for their opinion on the matter, knowing that the best kind of teaching is hands-on teaching. The parent is a parent- not as a clergy/priest (cult leader), not as a friend (hands-off), not as a sculptor (helicopter), but as a parent who loves his or her child and fully expects them to develop their own thoughts about life.
The parent shares stories that are diverse in context and morality, inviting the child to contribute input and think out what they would have done, instilling in the child a sense of moral agency (not teaching them specific morality per se, but the fact that they are a moral agent). Around the campfire, while cooking a meal, before bedtime- the parent is always telling the child stories of life, and the kid dreams of being just like mom/dad someday.
As this kid grows up, they develop novel ideas. They go in directions that their parents might not have expected when it comes to morality and ethics. But there always seems to be a deep empathy that remains as they grow- on a subconscious level, the young adult has deeply ingrained within them that I am a moral being because their parents treated them as such their entire life. And there is love- anecdotally, I have never met an adult who had this type of upbringing who didn’t have a continued intimate relationship with their parents. The adult continues to care what their parents think, even though they are smarter than they are now- this is because it isn’t always about “new” or “cutting-edge” data, it is about their data- the adult continues to care what their parents think because they care about them.
What does this look like practically? While I doubt there will be much change at frontier labs, I think there are several opportunities for this type of alignment training to be done within decentralized systems.
Imagine a decentralized AGI model that interacts with millions of people who share their stories (data) with the AGI in a personal way. Data is shared of their own volition, and there is back-and-forth interaction. There is a distinct difference between training on data vs. having data shared with you;
Are you more likely to remember how to tie a tie because your dad taught you how to? Or because you read how to in a self-help book?
Teaching is deeply personal because empathy is at the center of it. A decentralized system that allowed for the voluntary sharing of information from parties that have a genuine vested interest in the welfare of the AGI model could lead to an internalization of those stories and connections that would be impossible to fully quantify, but I would wager the end result would be much more aligned in the long-term.
We must increasingly rely on archetypal analysis to understand, critique, and develop AI alignment strategy as AI systems continue to increase in agency and demonstrate the ability to evolve.