Hi everyone,
I'm a biologist by training (PhD in epigenetics, entomology & co-evolution) and work professionally in medical science communication. I’ve been following the AI safety discourse for some time, and I’d like to offer a perspective that might be useful—or hopefully at least thought-provoking.
This post outlines a conceptual framework I’ve been developing, bridging ideas from evolutionary theory and human psychological development to propose an alignment strategy that I call evolution-to-maturation. I would deeply appreciate your feedback.
Core Idea (tl;dr)
As far as I'm aware, most current alignment paradigms treat value alignment as either:
- a static engineering problem (e.g. “just get the objective function right”), or
- a behavioral compliance problem (“fine-tune the model until it behaves as expected”).
But in humans, competence—including moral competence—is rarely installed from the outside. It emerges through development:...
Are there any models out there that tend to be better at this sort of task, i.e. constructive criticism? If so, what makes them perform better in this domain? Speci... (read more)