Engineering as Constraint Design

Ben Havis

Rejected for the following reason(s):

LessWrong has a particularly high bar for content from new users and this contribution doesn't quite meet the bar.

Read full explanation

Something I keep noticing, in software, in ML, in every system that ever grew too powerful, is that failures rarely come from not being capable enough. They come from being too free.

Give a system more reach than you’ve constrained, and it eventually wanders somewhere you never intended. In regular engineering, that shows up as weird edge cases, runaway side effects, or integrations that “technically work” but break everything around them. In AI, it shows up as reward hacking, unexpected generalization, or a model doing something that fits the objective but not the intention behind it.

Either way, the pattern is the same:

Capability expands the space of things a system could do.
Constraint defines the tiny subset of things it should do.

And when those fall out of sync, you get trouble.

The real cost of unboundedness

If you’ve ever built a complex piece of software, you already know this: every new feature quietly multiplies the number of states your system can get into. Teams don’t notice the growth at first. Everything feels flexible and fast. And then one day you hit a bug that isn’t a bug, it’s just the system doing something you never constrained it not to do.

AI scales this problem to the extreme. A powerful model has access to behaviors no one explicitly listed, reviewed, or even imagined. A mis-specified instruction becomes a loophole. A weak guardrail becomes an entire subspace of weird, “technically optimal” behaviors. Reward hacking isn’t mysterious. It’s exactly what happens when you let an optimizer roam too freely.

These aren’t capability failures.
They’re the bill for not constraining the system up front.

Constraint is the first design act

People treat constraints like add-ons. Something you sprinkle on after things get dangerous.

But stable systems, software or AI, start with constraints. They shape the space of allowed moves before capability ever shows up.

There are a few kinds that matter:

Behavioral constraints
What the system should never do. (Type checks, policy limits, refusal rules.)

Structural constraints
How parts are allowed to interact. (Interfaces, modularity, interpretability scaffolds.)

Temporal constraints
How the system changes over time. (Rate limits, distribution shift boundaries, preventing wild self-modification.)

You put these in place early so the system doesn’t grow into something you can’t reason about.

Constraint isn’t the opposite of capability.
Constraint is how capability stays legible.

Why people resist constraints

I’ve heard every version of these objections:

“Constraints slow us down.”
“They limit creativity.”
“We might set the wrong boundaries.”
“A smart system shouldn’t need them.”

But every one of these arguments is short-term thinking.

Unboundedness feels good in the moment and catastrophic in hindsight.
Boundaries feel restrictive now and protective later.

If you’re wrong, you can revise a constraint.
If you never add one, the system will eventually surprise you, and you can’t undo that.

Constraint is how you make a system trustworthy

Trust comes from knowing what a system won’t do more than what it can do.
Predictability beats power.

A constrained AI system says:

“Here’s where I stop.”

And once you know the boundaries, oversight scales, delegation becomes safer, and failure stops being existential.

Constraint gives a system shape.
And once something has a shape, you can understand it, align with it, and trust it.

LESSWRONG
LW