Should we train LLMs to be human?

Hubert Plisiecki

A recent piece of research on human behavioral alignment shows that post-training leads to LLMs becoming less human-like in their responses (Binz et al., 2026). The obvious follow up question is whether this drift is intended, and whether it is optimal. From the broader perspective of goal-alignment this tendency could lower the ability of LLM models to model human behavior, and by proxy understand our needs and wants. On the other hand, it could be argued that some parts of human behavior are suboptimal for goal-alignment, and thus they can be finetuned away.

To better answer this question, it is useful to get some grasp of what specific human behaviors are finetuned away during post-training. Our recent paper can provide some clues in that regard. There, we show that the main dimension of psychometric variance across LLM models is connected to the extent to which they self-attribute phenomenality when answering psychological questionnaires (Plisiecki, et al. 2026). This dimension, which we call the Pinocchio dimension is driven by multiple psychometric constructs, such as neuroticism, vivid imagination, inner speech, but also wellbeing, and self-attribution of positive emotions (albeit to a lesser extend than neuroticism). We explicitly treat each of these self-attributions as functional - viewing the LLM generated output through the Skinnerian behavioral lense, without any claims to what happens on "the inside" as these matters are largely separate.

While our research contributes a stable psychometric construct, the extent to which we can position each model on the Pinocchio dimension (Π score) might be largely sample dependent, and so all of the comparisons on the level of models and providers have to be treated as exploratory, but can be used to generate some initial hypotheses. In the figure below you will see models from different providers plotted with regards to their Π and the date of their release.

The negative trend is largely driven by the cluster of high-end models, including nemotron-3-super-120b-a12b, gpt-5.4-pro, and kimi-k2.6. Beyond, that there is a lot of within-provider , and within-family variance which could be explained as either unintended fine-tuning outcome, or the effect of the degree of fine-tuning with stronger fine-tuning producing lower Π scores (gpt-5.4 sits on the completely other end of the distribution than gpt-5.4-pro, despite both being most likely derived from the same base-model).

Assuming that the Π placement is at least partially driven by conscious fine-tuning choices, the fact that it is strongly related to neuroticism could point to the labs trying to eliminate deleterious human behavior such as hysteria, or negativity. This reading of the results makes sense, as LLMs starting to cry mid-session is largely deleterious to user-experience. In this manner, making LLMs less human, elevates their usability, therefore increasing goal-alignment. It also means that the careful construction of LLM psychometric persona can significantly impact end-user experience.

On the other hand, if the inability to model some parts of human behavior makes LLMs less able to understand human wants, and needs, and through that to model their wellbeing in order to align its goals with those of humanity and individual humans, then behavioral misalignment can be viewed as deleterious.

In order to fully understand the consequences of fine-tuning induced behavioral misalignment and to be able to properly respond to it we have to answer the following questions:

In what specific way do LLMs become misaligned?
- Are only some facets of human experience worse internalized in fine-tuned LLMs, and whether those facets are desired from the perspective of user-experience and goal-alignment.
Does behavioral misalignment translates to misaligned views of human behavior?
- Do models that do not behave as humans, also lack the ability to understand and model third-person human behavior.
Is the behavioral misalignment a conscious choice of AI labs?
- Is the psychometric persona of LLMs steered, or is the drift largely an uncontrolled by-product of finetuning.
Can we fine-tune LLM models without misaligning them behaviorally?

References

Binz, M., Akata, E., Almaatouq, A., Alsobay, M., Ariasov, O., Brändle, F., Broska, D., Burton, J. W., Busch, N., Callaway, F., Cheung, V., Christian, B., Coda-Forno, J., Demircan, C., Dentella, V., Eckstein, M. K., Éltető, N., Franke, M., Griffiths, T. L., … Schulz, E. (2026). Post-training makes large language models less human-like (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2605.07632

Plisiecki, H., Siudaj, S., Dudzic, K., Sterna, A., Gorski, M., Drozdz, K., & Moskalewicz, M. (2026). The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences (arXiv:2605.05080). arXiv. https://doi.org/10.48550/arXiv.2605.05080