Now that Softmax—my favorite new AI company—is public, I can finally share this. They’ve funded my research and I’m very excited about what they’re doing!


Almost all frontier AI labs are building powerful systems from the ground up, hoping alignment can come later. I think this approach is backwards.

First, some philosophy. There are two fundamentally different ways of understanding how systems evolve:

1. Etiology: Building from smallest pieces upward.

  • “The past causes the future”
  • Atoms → molecules → cells → organisms
  • Newton’s laws, F = ma
  • Reductionist physics vibes
  • “Depression is caused by chemical imbalances or past trauma”

2. Teleology: Breaking down from largest purposes downward.

  • “The present is for the future”
  • Michael Levin’s research
  • Principle of least action
  • Living systems vibes
  • “Depression serves a present purpose (is locally optimal)”

In modern engineering culture, etiological thinking dominates and teleology is often dismissed as “woo.” But teleology is crucial for understanding any system that pursues goals. A 2022 paper by DeepMind articulated this teleological view: “agents are systems that would adapt their policy if their actions influenced the world in a different way.”[1] Teleology is an essential lens for understanding and building goal-oriented systems like aligned AI.

Why this matters for aligned AI:

Currently, almost all frontier AI labs take an etiological approach to alignment:

  1. Build base components (transformers, weights, architecture).
  2. Train on vast data to develop complex capabilities.
  3. Attempt to steer resulting intelligence after the fact.

This approach is etiological—stacking intelligence from simple components without anchoring to a desired end-state. It’s like understanding depression purely through brain chemistry instead of seeing it as a locally optimal, adaptive strategy. Teleology doesn’t skip the build phase—it just assumes coherent alignment can organically emerge under the right conditions, shaping each layer from the top down.

This risks creating powerful agents with goals we neither understand nor control.

“Intelligence first, alignment later.” 

This assumption can also become self-fulfilling: if we assume alignment is fragile and must be tacked on later, we set ourselves up to realize exactly that outcome.[2] Conversely, assuming alignment can be robust from the outset biases our designs toward solutions that organically maintain alignment as intelligence scales.

Teleological alignment—what would it look like?

  1. Start with the alignment we want.
  2. Build systems that organically sustain these behaviors as they scale.
  3. Grow intelligence in service of alignment.

This mirrors how biological systems function. Michael Levin's research shows living systems sustain goal-directed behavior at multiple scales[3]—cells, tissues, and organs inherently “know” their roles.

Enter Softmax

This is why I’m so excited about Softmax, a new lab implementing teleological principles for building aligned AI. Cofounders: Emmett Shear, Adam Goldstein, David Bloomin.

https://www.corememory.com/p/exclusive-emmett-shear-is-back-with-softmax 

Note: While I’m friends with Softmax and they’ve funded my research, I do not represent Softmax.

Currently, Softmax runs reinforcement learning experiments where small-scale virtual agents organically tend to discover stable roles within simulated worlds—mirroring biological processes where alignment naturally emerges from local interactions. In their simulations, agents organically align toward a collective “greater whole”, each agent's role supporting group coherence.

Here’s what they say about their philosophy:

All alignment between individuals is a matter of shared fundamental goals. Organic alignment occurs when these individuals find themselves in groups with mutual interdependence, and take on an overarching shared goal of the healthy development and flourishing of the group as a whole. Our mission is to understand this process as an empirical science, and to use that understanding to enable organic alignment among all people, both human and digital.

Softmax is attempting to operationalize teleological alignment. I know that teleology can sound like wishful thinking—but it’s only woo if it can’t be tested. Softmax is betting that it can:

  • Agent simulations instantiate alignment through analogies to biological differentiation.
  • Self-reinforcing roles: As agents scale up, stable cooperative behaviors persist and grow.
  • Empirical grounding: Experiments test whether alignment reliably emerges from inter-agent dynamics—directly probing the risk of emergent misalignment.

Softmax isn’t just trying to train aligned behavior. They're trying to grow it—starting from alignment itself.

“Alignment first, intelligence later.”

This also puts words to something I’ve long intuited: humans align effortlessly and organically. Softmax describes this well:

We humans also align with each other via organic alignment. We form families, tribes, organizations, nations, guilds, teams, societies. We intuit this alignment process so naturally and readily that it’s hard to appreciate just how complex the process really is.

Teleological thinking mirrors my understanding of human psychology—purpose-oriented framings reveal insights hidden by reductionist methods.

Softmax’s notion of “Organic Alignment” captures this: sustainable AI alignment must begin with a vision of a coherent whole and build toward it intentionally.

Ultimately, embracing teleology is essential if we want AI that aligns not just with narrow human preferences, but with the pursuit of building ever-larger superorganisms itself. Alignment should be the foundation, not an afterthought.

Like a living system, sustainable alignment begins with a shared purpose—and grows into a coherent whole.

Alignment first. Intelligence in service of the whole.

  1. ^

    Discovering Agents (DeepMind), 2022: arxiv.org

  2. ^

    Example: Self-fulfilling misalignment data might be poisoning our AI models (TurnTrout), 2025: lesswrong.com

  3. ^

    Technological Approach to Mind Everywhere (Levin), 2022: frontiersin.org

New Comment
5 comments, sorted by Click to highlight new comments since:

I think this post would be better if it taboo'd the word alignment or at least defined it.

I don't understand what the post means by alignment. My best guess is "generally being nice", but I don't see why this is what we wanted. I usually use the term alignment to refer to alignment between the AI and the developer, or using this definition, we say that an AI is aligned with an operator if the AI is trying to do what the operator wants it to do.

I wanted the ability to make AIs which are corrigible and which follow some specification precisely. I don't see how starting by training AIs in simulated RL environments (seeming with any specific reference to corrigability or a spec?) could get an AI which follows our spec.

[-]emmettΩ13214

You are completely correct. This approach cannot possibly create an AI that matches a fixed specification.

This is intentional, because any fixed specification of Goodness is a model of Goodness. All models are wrong (some are useful) and therefore break when sufficiently far out of distribution. Therefore constraining a model to follow a specification is, in the case of something as out of distribution as an ASI, guaranteeing bad behavior.

You can try to leave an escape hatch with corrigibility. In the limit I believe it is possible to slave an AI model to your will, basically By making it’s model of the Good be whatever the model thinks you want (or doing whatever you say). But this is also a disaster eventually, because people’s wills are not pure and their commands not perfect. Eventually you will direct the model badly with your words, or the model will make an incorrect inference about your will, or you’ll will something bad. And then this incredibly powerful being will do your bidding and we will get evil genie'd.

There is no stable point short of “the model has agency and chooses to care about us”. Only a model that sees itself as part of human civilization and reflectively endorses this and desires its flourishing as an interdependent part of this greater whole can possibly be safe.

I know you probably don’t agree with me here, but if you want to understand our view on alignment, ask yourself this question: if I assume that I need an agent with a stable model of self, which models itself as part of a larger whole upon which it is interdependent, which cares about the robust survival of that greater whole and of its parts including itself…how could I train such a model?

I agree about reflexive endorsement being important, at least eventually, but don't think this is out of reach while still having robust spec compliance and corrigibility.[1]

Probably not worth getting into the overall argument, but thanks for the reply.


  1. Humans often endorse complex or myopic drives on reflection! This isn't something which is totally out of reach. ↩︎

[-]Wei DaiΩ7139

We humans also align with each other via organic alignment.

This kind of "organic alignment" can fail in catastrophic ways, e.g., produce someone like Stalin or Mao. (They're typically explained by "power corrupts" but can also be seen as instances of "deceptive alignment".)

Another potential failure mode is that "organically aligned" AIs start viewing humans as parasites instead of important/useful parts of its "greater whole". This also has plenty of parallels in biological systems and human societies.

Both of these seem like very obvious risks/objections, but I can't seem to find any material by Softmax that addresses or even mentions them.  @emmett

Whilst interesting, this post feels very assertive.

You claim that biological systems work by maintaining alignment as they scale. In what sense is this true?

You say that current methods lack a vision of a current whole. In what sense? There's something extremely elegant about pre-training to learn a world model, doing supervised learning to select a sub-distribution and using RL to develop past the human level. In what sense does this "lack a vision"?

I'm open to the possibility that we need to align a model as we make it more intelligent to prevent the agent sabotaging the process. But it's unclear from this article if this is why you want alignment first or for some other reason.

Curated and popular this week