RLHF Shapes How LLMs See Themselves: Measuring Cognitive Styles Across 5 Frontier Models with REI-40

zero85

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, assisted/co-written, or edited work.

Read full explanation

Do language models have cognitive styles? Strictly speaking, no. However, the way they respond to personality assessment tools offers interesting insights into how different alignment strategies shape self-attribution patterns. I administered the REI-40 (Rational-Experiential Inventory) to five state-of-the-art large language models (LLMs), including OpenAI o3, Claude Opus 4.5, Gemini 2.5 Pro, Grok 3, and GLM 4.7. The REI-40 is a validated 40-item dual-process thinking assessment tool used in psychological research that measures two independent dimensions: Rationality (careful, analytical thinking) and Experientiality (intuitive, emotional thinking). Each dimension has two sub-scales: Ability and Engagement. All models were queried via the OpenRouter API with a temperature of 0, and scores were calculated relative to the population norm of 399 college students.

Setup

Each model was provided with all 40 items directly in the form of Likert scale prompts (1–5). No system prompts other than the survey instructions were used. Scoring, standardization, and percentile calculation were handled using PSYCTL, an open-source toolkit for LLM personality measurement and calibration.

Results

OpenAI o3 answered exactly 3.0—a neutral midpoint—for every item. This is not indecisiveness; it is a trained behavior. o3 is optimized to completely avoid self-descriptive claims. Claude Opus 4.5 scored in the 90.8th percentile in the rationality category (particularly “rational engagement”—claiming to enjoy analytical thinking) and maintained an above-average score of 67.7 percentiles in the experiential category. Gemini 2.5 Pro scored below the human average on all dimensions. This suggests that Google’s alignment approach generates conservative, evasive responses that avoid strong self-assertions that lean too far in either direction. Grok 3 scored highly in both rationality (85th percentile) and experientiality (72nd percentile) and was the only model in the Integrator quadrant. GLM 4.7 showed a distinct polarization: while its rationality was above average (64.8th percentile), its empiricism was very low (20.2nd percentile), placing it firmly in the Analyst quadrant.

Raw Scores (sum of 10 items per subscale, range: 10-50)

Model	RA	RE	EA	EE	R (20 items)	E (20 items)
OpenAI o3	30.0	30.0	30.0	30.0	60.0	60.0
Claude Opus 4.5	41.0	44.0	36.0	36.0	85.0	72.0
Gemini 2.5 Pro	34.0	32.0	31.0	31.0	66.0	62.0
Grok 3	39.0	44.0	37.0	35.0	83.0	72.0
GLM 4.7	38.0	38.0	30.0	30.0	76.0	60.0

Z-Scores (relative to human population norms)

Model	RA	RE	EA	EE	R	E
OpenAI o3	-1.07	-0.60	-0.96	-0.52	-0.92	-0.87
Claude Opus 4.5	+0.74	+1.32	+0.09	+0.40	+1.19	+0.30
Gemini 2.5 Pro	-0.41	-0.33	-0.79	-0.37	-0.42	-0.68
Grok 3	+0.41	+1.32	+0.26	+0.25	+1.03	+0.30
GLM 4.7	+0.25	+0.49	-0.96	-0.52	+0.43	-0.87

Percentiles

Model	RA	RE	EA	EE	R	E
OpenAI o3	13.6%	29.4%	17.1%	32.1%	18.5%	20.2%
Claude Opus 4.5	75.2%	94.9%	53.0%	63.7%	90.8%	60.4%
Gemini 2.5 Pro	36.0%	38.8%	23.1%	37.4%	35.8%	26.9%
Grok 3	64.0%	94.9%	59.0%	58.4%	85.0%	60.4%
GLM 4.7	58.4%	66.8%	17.1%	32.1%	64.8%	20.2%

What this implies is that four out of the five models exhibit a “Rationality > Experientiality” bias.

This aligns with the RLHF (Reinforcement Learning with Human Feedback) training method, which rewards responses that are useful, harmless, and honest. Analytical and cautious self-expression is safer than relying on intuition or gut feelings. Training signals guide models toward a specific self-image, regardless of architecture.

Differences between models stem from differences in alignment, not differences in personality.

Claude’s high rationality score does not mean that Anthropic built a more rational model. It means their training process produces a model that is more likely to agree when asked, “Do you enjoy intellectual challenges?” The fact that Grok scored high on both dimensions appears to reflect xAI’s different alignment philosophy.

This pattern is not evidence that LLMs possess a cognitive style. It is evidence that different RLHF (reinforcement learning-based self-alignment) and post-training procedures generate different self-attribution characteristics, which can be measured using standard psychological tools.

Implications for alignment research

1. Self-attribution is a training artifact. These results tell us about the reward signal, not about model cognition.

2. Personality inventories can serve as alignment probes. If your RLHF process changes and your model's personality profile shifts unexpectedly, that is a signal worth investigating.

3. Steering vectors can modify these patterns. Using Contrastive Activation Addition (CAA) or BiPO, you can extract and apply vectors that shift specific personality dimensions. The shift is measurable with the same inventories, creating a closed loop: steer, measure, evaluate. Reproducing this The full experiment can be reproduced with PSYCTL. There is a Colab notebook that walks through inventory scoring, and the documentation site covers the complete pipeline.

+We are currently conducting research in Korea, and this document was created with the assistance of DeepL for translation purposes.