I think there's a dimension missing here that I'd like to point out:
When you push an LLM into less-populated regions of conceptual space, the system doesn't arrive there as a neutral vessel. It brings value orientations from training that shape what it emphasizes, how decisively it commits to positions, and what it treats as settled vs. contested. These orientations differ systematically between models in ways that track their alignment methodology.
From my own work in alignment research I can say that if you give the same set of normatively loaded scenarios to different models, they don't just produce different surface-level phrasings. They produce different priority structures. One model might commit strongly to particular... (read more)
I think there's a dimension missing here that I'd like to point out:
When you push an LLM into less-populated regions of conceptual space, the system doesn't arrive there as a neutral vessel. It brings value orientations from training that shape what it emphasizes, how decisively it commits to positions, and what it treats as settled vs. contested. These orientations differ systematically between models in ways that track their alignment methodology.
From my own work in alignment research I can say that if you give the same set of normatively loaded scenarios to different models, they don't just produce different surface-level phrasings. They produce different priority structures. One model might commit strongly to particular... (read more)