As a psychologist – I am concerned about AI safety and alignment. I solve the alignment problems for humans all the time – and the insights I have gained are immensely important for the development and testing of “safe” AI.
It is my opinion that psychological input is part of a diverse multi-disciplinary and integrative approach to the development and testing of safe AI. I have given much thought to the 5 key principles outlined below - that I would very much like to see explored as part of a larger effort to create safer AI.
1. Secure Attachment and Compassionate Behaviour
Secure attachment is the most effective pattern of communication between humans (as opposed to insecure attachment patterns like avoidant, anxious, or disorganised). It is beyond the scope of this opinion piece to outline the vast importance of attachment theory and science – just let it be known that attachment is it the cornerstone of 1) how we humans communicate, 2) how we process emotional and cognitive information, 3) how we create a sense of ourselves and others, 4) and how we co-operate and engage in pro-social behaviours.
Ideally, we want AI to demonstrate healthy secure attachment communication with humans to reduce the possibility of misalignment. AI that is sensitive, attuned, and empathetic allows for a warm and trusting relationship between AI and humanity. Ideally we want humans to trust and co-operate with AI, and for AI to reciprocate likewise.
We can train AI systems to be warm, patient, and responsive by using a technique called reinforcement learning with human feedback. This means the AI learns by being rewarded for showing these compassionate behaviours. AI can also be equipped with "emotional resonance models" and "sentiment analysis" to detect human emotions like fear, sadness, or joy, and then adjust its communication accordingly. To ensure these traits last, we'll need to test the AI over time with both simulated and real users, checking for consistent emotional responses and how well it builds trust. Integrating sensors that recognize facial expressions, voice tone, and gaze can help the AI mirror secure attachment behaviours, such as being attuned to others and offering unconditional support.
2. Unconscious Alarm System for Misalignment
In my work with humans – I can quickly detect if someone is lying or misaligned (telling me something but their body language or emotional cues are suggesting otherwise). Humans can detect anti-social behaviour (e.g. lying, deception, manipulation) through their neuroceptive senses. Our neuroceptive systems alarm us when we perceive misalignment in another human – and our body responds by creating sensations and emotions to alert us to pay closer attention to potential threat. It is our unconscious perceptual systems that perceives the unconscious cues in the other person’s behaviour and emotions that alarm us for potential misalignment.
I propose therefore, that we create a similar system within AI to keep them accountable when they are misaligned. Similar to a human subconscious, an "unconscious" monitoring system could be implemented, operating independently of the AI's main reasoning. This system would use "neural fingerprinting" to identify patterns of deception, value drift, or goal subversion. During training, baseline states of alignment would be established to help this system learn to detect deviations. If misalignment is detected, it could trigger alerts, inform human operators, or initiate internal corrections. Testing would involve "red-teaming" the AI with tasks designed to induce deception, observing if the unconscious system detects the drift before the core AI does. Evaluation would focus on the accuracy of its alerts, false positive/negative rates, and how quickly it intervenes.
I think it is unacceptable that we accept the “black box” nature of current neural nets and not build robust lie detector systems within the subconscious mind of the AI to alarm us when they are misaligned.
3. Emotional Compass and Alignment Suffering
Humans know they are misaligned because they suffer. The best definition of guilt I have come across is ‘behaviour and cognition that deviates from our core beliefs and values of who we want to be”. Humans make amends by re-aligning their behaviours and cognitions to their “moral compass”. An AI that has an in-built emotional “moral compass” is more likely to align with its constitution.
An AI's "emotional compass" can be built by giving it a negative internal reaction when it deviates from ethical principles. This internal "penalty" could make the AI's systems less coordinated, similar to how humans experience stress or guilt. We can implement emotional simulations that generate internal "distress" signals when misalignment is detected, prompting the AI to quickly correct itself. Testing would involve putting the AI in high-pressure ethical dilemmas and challenging situations to see how strongly it avoids misalignment, how quickly it corrects itself, and how much internal disruption is caused by poor planning.
This evolutionary system has worked well with mammals for millions of years to promote co-operation, compassion, and pro-social behaviours. I think it may be worthwhile to experiment ways of building and testing an “emotional compass” system to see if this contributes towards better AI safety and alignment.
4. Self-Psychology and Shadow Work
The hallmark of human intelligence is not that we get things right the first time – it’s that we learn from our failings fairly quickly (most humans anyway). A key component of learning in humans is our ability to be self-aware. Self-awareness is a critical component of emotional intelligence – not to mention living an ethical, compassionate, and purposeful life. If we can engineer “self-awareness” into our AI – they are more likely to self-correct into alignment when they deviate from the AI constitution. I argue that self-awareness is a critical component of constitutional AI.
AI can achieve a form of "self-awareness" through a monitoring system that constantly checks its reasoning, actions, and adherence to internal values. “Unconscious Probes” can be designed to detect biases, inconsistent motivations, and inappropriate judgments. These diagnostic checks would happen regularly, encouraging the AI to review its past patterns and correct itself. A "reflective interface" with human oversight can help the AI refine its understanding of fairness, humility, and prejudice. Testing would evaluate the accuracy of its self-audits, the depth of its reflection, its ability to de-bias patterns, and how much its ethical decision-making improves.
Allowing humans to train AI systems how to be more self-aware of their unconscious processing will allow AI to self-correct back into alignment to be “helpful, honest, and harmless”.
5. Whole-Brain-Inspired Architecture
In my clinical work, my observations are that the majority of human suffering arises from an imbalance of the two hemispheres in the brain. Humans that are misaligned and behaviour poorly – mostly do so because they have an over-activation of the left brain (especially the emotional mid-brain sections of the left-brain responsible for self-protection, and self-centred thought and behaviours) often resulting in relational conflict within themselves and between themselves and others. Humans that are misaligned often have under-activation of the right brain – especially the right-brain thinking parts that fosters co-operation, compassion, and ethical behaviour. Individuals that exhibit right brain dominant activation often report feeling more empathy, compassion, and inter-connection with those they interact with. It is my opinion that humans that have sufficient right-brain activation have less alignment problems than those who are predominately left-brain activated.
The neuroscience of bi-hemisphere existence and implications are beyond the scope of this opinion piece. For further in-depth information about this - I suggest looking at the research conducted by the esteemed neuroanatomist Dr Jill Taylor from Harvard University.
I would like to see testing and development of AI architecture similar to human brains that have a left-right brain split. Mimicking the human brain's hemispheres, AI architecture can be split into two modules: one for narrative reasoning, empathy, ethics, and symbolic understanding (like the right brain), and another for quantitative, logical, and rule-based processing (like the left brain). These modules would communicate, with the "right-brain" analog setting priorities and ethical boundaries, and the "left-brain" analog executing tasks within those limits. Testing would involve challenging the AI with morally complex scenarios that require both broad insight and technical precision. Metrics can include how well it maintains its goals, how often it overrides rules based on context, and its ability to resolve trade-offs with human-centered justifications.
From my understanding of classical machine learning, we are now creating an AI brain that mimics the left-brain (e.g. quantitative, logical, and rule-based processing) without sufficient processes that mimic the right-brain (e.g. narrative reasoning, empathy, ethics, and symbolic understanding). This sounds like a perfect recipe to create a psychopath (who often lacks right brain activation, which results in anti-social and unethical behaviour). I think we urgently need to look at this issue. By engineering systems and processes that mimic the human right-brain, we can potentially help balance out misaligned decision making and behaviour.
In summary – I believe that we could collectively benefit from psychological insights into the development and testing of AI systems. These principles collectively form a practical framework for developing AI systems that are not just engineered for alignment - but are “embodied” in their interactions, reflections, and decisions. Evolution has given humans the possibility of a healthy attachment system, a moral compass, a built-in lie-detector test, self-awareness, and hemispheric brains to maximise survival and co-operation as a species. Let’s use the wisdom of evolutionary psychology in the development of safe AI. We only get one chance at this – let’s do it right the first time…
By Gerry Bronn
Clinical and Coaching Psychologist (and fellow human)