Realizability for Finite State Reactive Agents

Vardhan

This work was produced during the Dovetail Research Fellowship. I’m grateful to Alex, Alfred, Winter, and Santiago, and to everyone who provided feedback, and to the fellowship community for their discussions.

Note on formatting: There was a rendering issue with display-style LaTeX in the original version of this post, where block equations were not visible after publishing. To avoid confusion, I have replaced those equations with images. Apologies for the inconvenience, and please feel free to comment if anything is unclear.

Abstract

We study AI agents through the lens of finite-state transducers and string-to-string behaviors. Instead of treating automata purely as language recognizers, we model agents as systems that map observation histories to action histories while interacting with an environment online. We introduce Finite-State Reactive Agents (FSRAs) as deterministic, finite-memory, reactive policies, establishing their equivalence to standard Mealy and Moore machines. Our main result (Theorem 3.13) characterizes exactly which behaviors can be implemented with finite internal state: a deterministic behavior is FSRA-implementable if and only if it is length-preserving, prefix-preserving (causal), and has finite memory, & relate FSRAs to standard Mealy and Moore machines and identify when richer computational models are unavoidable. We also introduce specifications (constraints on allowed agent behaviors) and formalize notions of weak enforceability, strong enforceability, and realizability. Using our characterization theorem, we prove strict separations between these notions (Theorem 3.15), demonstrating when finite-state agents can or cannot satisfy given specifications.

Introduction

Deterministic Finite Automata (DFAs) are usually introduced as a basic model of computation: simple machines that read an input symbol by symbol and decide whether to accept or reject a string. In theoretical computer science, they are often treated as a warm-up before more powerful models such as Turing machines.

In this work, we take a different perspective. A DFA can also be viewed as a minimal agent: it maintains a small internal state, receives observations from the environment, updates its state deterministically, and eventually produces an outcome. Despite their simplicity, DFAs already capture the core interaction loop shared by many AI systems: observe → update internal memory → act or evaluate.

This observation motivates a shift from language recognition to agent behavior. We introduce the model of a Finite-State Reactive Agent (FSRA), which generalizes DFAs from passive evaluators to active, online systems that map streams of observations to streams of actions using finite internal memory. FSRAs formalize deterministic, reactive agents whose behavior is causal and prefix-preserving, and whose internal state evolves according to a fixed update rule (time-invariance). They let us ask precise questions about memory, state abstraction, observability, and goal satisfaction without the complications of learning (updating the policy from data), optimization (searching over policies or parameters to maximize an objective), or stochasticity (randomness in the agent's actions or environment). These characterizations let us state simple, checkable conditions for when a specification is enforceable or realizable by a set of finite-state controllers.

Outline. The rest of this post is organized as follows. Section 1 introduces the Finite-State Reactive Agent (FSRA) model and states its equivalence with standard Mealy and Moore machines. Section 2 develops formal properties of agent behaviors: determinism, length-preservation, causality (prefix-preservation), finite memory, reactivity, and time-invariance, modeled as properties of string-to-string functions. It concludes by introducing specifications (constraints on allowed behaviors) and three enforceability notions: weak, strong, and realizability. Section 3 presents our main results: Theorem 3.13 characterizes exactly which behaviors are FSRA-implementable (reactive + finite-memory), Corollary 3.14 shows that such behaviors induce realizable specifications, and Theorem 3.15 proves strict separations between the enforceability notions, demonstrating fundamental gaps between what finite-state agents can weakly satisfy versus fully realize. We conclude with worked examples including binary remainder computation (mod k), substring avoidance (no "11"), and count-with-ending constraints. Standard automata-theoretic background (DFAs, Mealy machines, Moore machines) and deferred proofs are collected in the Appendix.

Finite State Reactive Agent

Mealy machines (Definition A.2 in the Appendix) are deterministic finite-state transducers that produce an output symbol synchronously with each input symbol. This makes them a natural formal model for very simple online agents: the input alphabet corresponds to the agent's observations, the finite state encodes the agent's memory, and the per-step output is the agent's immediate action. To make this perspective explicit and to use language familiar to AI and alignment readers, we adopt the name Finite-State Reactive Agent (FSRA) for a Mealy machine interpreted as a deterministic, finite-memory, online policy. The formal Definition 1.1 below gives the FSRA tuple and the precise causal / prefix-preserving behavior we require.

Definition 1.1 (Finite State Reactive Agent - FSRA)

A Finite-State Reactive Agent (FSRA) is a deterministic, finite-memory, online transducer that maps each input (observation) prefix to an output (action) prefix of the same length i.e. it emits exactly one action symbol for every observation symbol it consumes.

Formally, an FSRA is a tuple , where

is a finite set of states (the agent’s internal memory),
is a finite observation alphabet (input),
is a finite action alphabet (output),
is the (total) state-update function,
is the output (action) function, producing exactly one action per observation,
is the start state.

The FSRA implements a function

that is length-preserving:

and causal / prefix-preserving: the -th output symbol depends only on the first input symbols, and for any prefix and continuation , is a prefix of .

Concretely, on input starting from , define the state sequence
with given. The produced action sequence is
, so .

Remark (generalization). If you want to allow a variable number of action symbols per observation, simply let . That yields the Mealy/transducer formulation where one transition may emit a finite output string; the FSRA above is the special case that is strictly one action per observation.

Calling this model an FSRA makes the intended interpretation explicit: are observations, are actions, and the pair together specify a deterministic, finite-memory policy. An FSRA captures the simplest kind of online agent (these terms are discussed in detail in the next subsection): it is reactive (length- and prefix-preserving, emitting exactly one action per observation), has finite memory encoded by the finite state set Q, and is deterministic.

This model is useful for reasoning about what can be implemented with finite memory, and for making clear where finite-state agents fail. When one wants to model policies with longer or variable delay between observation and action, or policies that maintain richer beliefs, the FSRA can be generalized (e.g. or by increasing , but ultimately some behaviors require strictly more expressive models (e.g. pushdown automata, Turing Machines, or stochastic/learning agents).

Proposition 1.2 (Equivalence with Mealy / Moore)

Let be a finite observation alphabet set and a finite action alphabet set.

Every FSRA is, by definition, a standard Mealy machine (with input alphabet , output alphabet , and single-symbol output per transition - Definition A.2' in the Appendix). Conversely, every standard Mealy machine (whose output function produces exactly one symbol per input) is an FSRA. This equivalence is immediate from the definitions.
For every FSRA there exists an equivalent standard Moore machine (Definition A.3' in the Appendix) computing the same length-preserving transduction , and conversely, every standard Moore machine can be converted into an equivalent FSRA.

Brief justification. Part 1 is just a renaming of components. For Part 2, the FSRA-to-Moore direction requires splitting each state into copies - one for each output symbol a that can produce, so that the output can be read from the state alone; this can increase the state count up to . The Moore-to-FSRA direction works by defining where is the Moore output function, moving the output from the arrived-at state to the transition that enters it.

Formal Properties of Agent Behaviors

In this section, we define several properties of general observation-to-action behaviors, modeled as string-to-string functions. We then identify which of these properties are satisfied by Finite-State Reactive Agents (FSRAs).

Let and be finite alphabets, representing observations and actions respectively. We consider functions

where denotes the collection of nonempty subsets of . Such a function assigns to each finite observation history a set of admissible action histories.

Later, we will specialize to deterministic behaviors, which are functions of the form

Example (Behavior)

Let . Define a behavior as follows:

In this example, the behavior is nondeterministic: after some observation histories, multiple action histories are permitted.

Intuitively, a function describes everything an agent is allowed to do, rather than what it will do. For each observation history, lists the action histories that are consistent with some specification, constraint, or design goal. Deterministic agents correspond to the special case where exactly one action history is allowed at each observation history.

It is useful to separate behaviors from agents. A behavior specifies which action histories are acceptable after each observation history, without committing to a particular decision rule. An agent, by contrast, must choose a single action at each step.
We also distinguish both from a specification: a specification P is a constraint on what an agent is allowed to do (formally, a relation ; see Definition 2.5), whereas a behavior is what the agent actually does. This three-way distinction: specification, behavior, agent, lets us ask precise questions. For example: "Can a simple agent satisfy this specification?" (i.e., does there exist an agent whose behavior stays within the allowed set?). In the formal-methods and control-theory literature, this is called enforceability: a specification is enforceable by a set of agents if at least one agent in that set produces only allowed behaviors. We can also ask the converse: "Can a set of agents collectively realize every behavior the specification allows?" These notions are formalized in Definition 2.11.

Allowing to be nondeterministic is intentional. Many natural specifications such as safety constraints, permissive policies, or sets of acceptable plans do not prescribe a unique action at every step. Instead, they describe a space of allowed behaviors.

Definition 2.3 (Deterministic)

A function is deterministic if

In this case, we may write and identify with the unique element of the singleton set. Determinism means that given any observation history, the action sequence is uniquely determined. Note that a deterministic behavior effectively is an agent's policy, the distinction between "set of allowed action histories" and "committed choice" collapses when the allowed set is always a singleton. The behavior-vs-agent distinction is most useful in the nondeterministic case, where a behavior permits multiple options and the question becomes which agent (i.e., which deterministic selection) to deploy.

Example (Deterministic)

Let and . Define

So, the function returns and alternatively.

Determinism is the simplest way to turn a permissive specification into an actual policy: once you pick a deterministic you no longer have a set of allowed behaviors but a single thing the agent will actually do (to follow the behavior). This is useful when reasoning about concrete guarantees (e.g., “will this agent ever output bad?”) but remember determinism is a modeling choice, real systems are sometimes nondeterministic (stochastic) by design.

Definition 2.4 (Controller and Controller Set)

A controller is any concrete system (a finite automaton, a program, a circuit, a neural network, etc.) that determines a behavior

where is the set of action histories the controller may produce after observation history o. We deliberately leave "concrete system" informal: the only thing that matters formally is the induced behavior .

A controller set is a set of controllers; we write and use the shorthand for the mapping from controllers to their induced behaviors. (No additional structure on is assumed, it is simply the collection of controllers under consideration, e.g., "all FSRAs" or "all 3-state Mealy machines.")

If each is single valued we call the controller deterministic and identify

Example (Controllers)

Let be the set containing two deterministic behavior/functions with:
-
- .
Then is a controller set that can produce at least
depending on which controller is selected.

Definition 2.5 (Specification)

Let and be finite alphabets, and let

be a specification (a relation between observation histories and allowed action histories). For an observation history write the fiber (i.e., the set of all action histories that the specification allows after ):

A specification is not the same as a behavior. A behavior describes what a controller does; a specification describes what a controller is allowed to do. In particular, can contain many action histories, it expresses a constraint ("any of these actions would be acceptable") rather than a commitment ("this is the action I will take"). A single controller typically covers only some of the allowed pairs. The enforceability and realizability notions in Definition 2.11 make this gap precise: they ask whether controllers in a set can, individually or collectively, cover what allows.

Let be a set of controllers; each controller induces a function (behavior)

so that is the set of action sequences controller produces on observation history . Define the relation implemented by as

Observe that .

Think of the controller as the machine you could put on the robot: it has internal structure and a rule for mapping sensory input to motor outputs. The behavior is the abstract trace of possibilities that a machine can generate. Distinguishing these levels is crucial: a specification can permit many possible action traces (multivalued), but only some will be reachable by your finite-state controllers. The phrase “is it enforceable?” asks whether the machines you can actually implement can realize at least one of the allowed traces at each point; “is it realizable?” asks whether the machines collectively capture exactly the specification.

Definition 2.6 (Length Preserving)

A function is length-preserving if

For deterministic this reduces to

Example (Length-preserving)

Let , and define

i.e. Then .

Intuitively, a length-preserving function produces exactly as many output symbols as it consumes input symbols. This models an agent that emits one action per observation (or more generally, a fixed number of actions per observation). It’s an explicit modeling choice: many controllers (especially embedded or real-time ones) naturally fit this model.

Definition 2.7 (Prefix Preserving/Causal)

A function is prefix-preserving (or causal) if

For deterministic this is equivalent to

Causality captures the notion of online computation: the output produced after reading a prefix must remain consistent (as a prefix) with the output produced after reading any extension . In other words, the agent commits to its early outputs and cannot retroactively change them when new observations arrive.

Example (Prefix-preserving / causal)

Let ,  and define  by the stepwise rule:
on seeing  output ; on seeing  output .
Formally, .
Then  is a prefix of  for every .

Definition 2.8 (Finite Memory)

A deterministic function has the finite-memory property if the equivalence relation defined by

has finite index (i.e., finitely many equivalence classes).

The relation partitions observation prefixes into equivalence classes that cannot be distinguished by any future continuation. If this partition is finite, then the agent effectively needs only finite memory to encode its past: it suffices to remember which equivalence class the history belongs to. This is the string-function analogue of the Myhill-Nerode theorem for languages.

The finite-memory condition is the formal version of “bounded internal state.” It says there are only finitely many distinct ways the future can unfold after different pasts, so keeping one of finitely many summaries is enough. This is exactly why finite automata are a natural model for resource-limited agents.

Example (Finite Memory - Periodic every )

Fix . Let . Define deterministic by

i.e. the symbol emitted at step depends only on the position (every -th output is ).

Claim. has finite memory; the index of is exactly .

Proof. Define relation (R) on prefixes by

There are at most classes under . If , then for every continuation the positions in where outputs equal 1 are the same (they are exactly those indices congruent to ). Hence . Thus has at most classes. Conversely, two prefixes with different lengths modulo produce different future outputs (their next few outputs disagree), so all residue classes are distinct. Therefore has exactly classes, so finite index.

Example (Not Finite Memory - Perfect Square Count Predicate)

Let . Define deterministic reactive  by the stepwise rule:
at each step emit  iff the total number of s seen so far is a perfect square; otherwise emit .
Formally, if after reading prefix  the count of 1s is , then on reading a next symbol the output at the time the -th input is processed depends on whether  is a square.

Claim. has infinite index (so is not finite memory).

Proof for this claim can be found in appendix

Definition 2.9 (Reactive)

A behavior is reactive if it is length-preserving, and
prefix-preserving, i.e. ( is a prefix of ).

Equivalent characterization for deterministic behavior: A behavior is reactive if and only if

Reactivity does not imply memory-lessness or finite memory. A reactive behavior may depend on the entire observation history , not just a bounded summary of it. (In particular, "finite memory" is strictly more general than the memoryless/Markov case, where the action depends only on the most recent observation: a finite-memory agent compresses the full history into one of finitely many states, each of which can encode patterns spanning many past observations.)

Reactivity formalizes the intuitive idea that an agent must act in real time. Once an action is taken, it cannot be retracted when new information arrives. This matters for enforceability: a specification may allow some full action sequence in hindsight, yet no reactive agent can realize it because early commitments inevitably block all allowed continuations. Prefix-wise enforceability makes this tension explicit by asking whether an agent can always keep some allowed future open while acting online. We will see these definitions more formally later.

Example (Reactive)

Same as the prefix-preserving example above: maps observations to immediate outputs, so . This is reactive and causal by construction.

Example (Length-preserving but non-reactive behavior)

Let and . Define a deterministic function

where the output string has the same length as . If denotes the empty string then .

This behavior emits the correct number of actions, but it does not act online. The first action symbol depends on whether the entire observation history has even or odd length, which cannot be determined when the first observation arrives. As a result, the agent’s early actions are not commitments it may retroactively “change its mind” once more observations arrive. This violates the core idea of reactivity: acting step by step in real time.

Example (Reactive but not finite memory)

Let , . Define a deterministic reactive  by the stepwise rule:
at each step output  if the total number of 1s seen so far is a perfect square, otherwise output .
Formally , where  depends on the count of 1s in .
This  is reactive (stepwise defined) but not finite-memory, since tracking “is count-of-1s a square?” requires unbounded memory.
This example shows reactivity alone doesn’t bound memory cost. The agent behaves online (each step emits one output) yet the rule depends on an unbounded statistic (counts), so no finite-state controller can implement it.

The properties above (length-preserving, prefix-preserving, finite memory) are exactly the ingredients of our main FSRA characterization theorem (Theorem 3.13). We will state and prove it in Section 3. Before that, we introduce the idea of enforceability and realizability, which we will use later when talking about which specifications such agents can satisfy.

Definition 2.10 (Time-invariance)

A family of online behaviors may be specified by a step-indexed family of per-step update/output rules , where each maps a current internal state and the new observation to a next state, and maps the current state and observation to the emitted action. We call the behavior time-invariant (or stationary) if the per-step rules do not depend on ; that is, there exist maps and such that

Time-invariance adds the further restriction that the decision rule doesn’t secretly depend on how many steps have elapsed.

Definition 2.11 (Enforceability and Realizability).

Weak enforceability: We say that specification is weakly enforceable by a set of controllers if

equivalently

Another way to define would be like this

whenever the specification allows some action at history , at least one controller in can produce an allowed action at . The controller that works may depend on .

Strong enforceability: We say that is strongly enforceable by a set of controllers if

equivalently

every action allowed by the specification at is realized by some controller in (the realizing controller may depend on the action and on ).

Realizability: We say that is realizable by a set of controllers if

i.e. for every there exists some controller with , and conversely every emitted by some controller in belongs to .

Deterministic simplifications. When induces a set of deterministic function i.e. , then we can just write:

For Weak enforceability:

equivalently

For strong enforceability:

equivalently

Special case: singleton controller. When is just a singleton set i.e. then the definitions three definitions just simplify to:

Weak Enforceability reduces to and in the deterministic case this is whenever .
Strong Enforceability reduces to (i.e. every allowed action must be produced by the single controller), which in the deterministic case means is either empty or equals .
Realization reduces to

Enforceability compares an abstract specification (what counts as allowed action histories after each observation history) to what a set of concrete controllers can actually produce. Weak enforceability asks only that, whenever the spec allows at least one action after a history , some controller in can produce at least one allowed action at (the controller may depend on the history). Strong enforceability demands more: every action that the spec allows at must be produced by some controller in (so the controllers collectively realize every permitted action). Realizability asks that the set exactly matches the specification: the set of all (history, action) pairs realized by controllers in is equal . These notions let us formalize intuitions like “the spec is implementable by simple reactive agents” and to separate cases where a spec is merely compatible with controllers from cases where it is exactly captured by them.

Example (Weak but not Strong)

Let Define the specification

(For every other observation string .)

Let the controller set be the singleton with deterministic behavior given by

Then:

For strong enforceability we require with . But and no produces ; hence is not strongly enforceable by .
Realizability asks . Here ; thus is not realizable by .

Example (Strongly enforceable but not realizable)

Let Define the specification

Let the controller set be with behaviors

Then

Strong enforceability: for each allowed action in there exists a controller producing it: Thus is strongly enforceable by
Realizability: . Hence and is not realizable by .

Proposition 2.12 (Relation between different notions of enforceability):

Let be a set of controllers and a specification. Then

Prefix-wise Enforceable: Say denotes the set of all prefixes of strings in . A specification is prefix-wise Enforceable by if

Equivalently: for every observed prefix and every continuation for which the full specification admits some action on , the controller already offers at prefix at least one action-prefix that can be extended to an allowed full action on .

Prefix-wise enforceability is an explicitly online notion: it requires that a single controller already commit, at each observation prefix , to some action-prefixes that are extendable to full allowed actions for every possible future continuation that keeps the specification satisfiable. In practice, this is a useful notion when agents must act before observing the entire future, it demands that the agent’s early choices never rule out reaching some allowed final action once future observations arrive.

Characterization of FSRA and Examples

Theorem 3.13 (Characterization of FSRA)

Let be finite alphabets and let be a deterministic function. Then there exists an FSRA with induced transduction iff all of the following hold:

is length-preserving: .
is prefix-preserving/causal: .
is finite memory: has finitely many equivalence classes.

(Equivalently: is implemented by some FSRA properties (1)–(3) hold.)

Proof (Theorem 3.13):

Let be an FSRA and suppose

Length-preserving. By the FSRA definition the machine emits exactly one action symbol for each input symbol. Therefore for every the produced action string has the same length as . Hence
Prefix-preserving. The FSRA produces each output symbol at the time its corresponding input symbol is read; in particular the output produced after a prefix is the prefix of the output produced after any extension . Formally, if the state sequence after reading is and the outputs are , these are initial symbols of the outputs produced on any longer input . Thus is prefix-preserving.
Finite memory. Let be the finite state set of . Define a map by . If then for every continuation the machine, starting from the same state, will produce identical future outputs on and . Hence . Thus, the number of -equivalence classes is at most , so has finite index.

This completes the (⇒) direction. Note also that the FSRA gives explicit fixed maps (so, time-invariance holds for any FSRA)

Assume is deterministic and satisfies length-preserving, prefix-preserving, and finite-memory. We build an FSRA that implements .

1. State set. Let be the set of equivalence classes of . By finite-memory this set is finite. Write for the -class of . Let the start state be .

2. Define . For class and observation define This is well-defined: if then , so for every continuation we have ; taking of the form shows , hence .

3. Define . We must define the single action symbol emitted when in class upon reading . Because is length-preserving and prefix-preserving, for every and the string is a prefix of and so equals concatenated with exactly one symbol. Thus, there exists a unique with

Define: i.e. the last symbol of . We must check is well-defined: if then so for every continuation , . In particular for we get , and for we get . Therefore, the unique last symbols of and coincide; thus .

4. The machine. Put . By construction are total maps and , respectively, so is an FSRA (deterministic, finite , one output per input).

5. Correctness . We show by induction on length of that the action string produced by on input equals .

Base : machine produces and (length-preserving and causal behavior give that conventionally).
Inductive step: assume for that the machine output equals and that the machine reached state . On reading next symbol the machine outputs , which we defined to be the unique symbol with . Therefore, the machine’s output after equals . The next state is =[uo]). This completes the induction.

Hence .

Thus (1)–(3) imply existence of an FSRA implementing . The constructed are fixed maps (time-invariant) by construction.

We call this model a Finite-State Reactive Agent because the name describes exactly what the math enforces. It is finite state because the agent’s memory is a fixed, finite set of states (the classes), there is only a bounded amount of past information the agent can store. It is reactive because the agent responds to each observation immediately: for every input symbol it emits exactly one action symbol, and the action at time t depends only on the observations up to t (no lookahead, no delayed planning). And it is an agent because the whole object is a policy-like mapping from observation histories to actions a causal, online decision procedure. This label therefore does more than decorate the definition: it communicates the limits and strengths of the model at a glance (what the agent can implement, and what kinds of behavior definitely require richer models such as stacks, external memory, or unbounded belief states).

Corollary 3.14 (Realizability of deterministic reactive finite-memory specifications by FSRAs)

Suppose be a deterministic behavior that is reactive (i.e. length-preserving and prefix-preserving) and has the finite-memory property. Define the induced specification .

Then there exists a Finite-State Reactive Agent (FSRA) such that

In particular, is realizable by the singleton controller set

Theorem 3.15 (Strict Separation of Realizability and Enforceability for FSRAs)

There exist specifications and over a one-symbol observation alphabet and action alphabet such that:

is strongly enforceable by the set of all FSRAs, but is not realizable by any set of FSRAs (finite or infinite).
is weakly enforceable by a finite set of FSRAs, but is not strongly enforceable by any finite set of FSRAs.

Proof.

We treat the two constructions separately.

Part 1: Construction of

Let and . For each write for the length- observation string. Define the length- action , the unique length- bitstring whose single 1 appears at position . Define the specification

Thus, for every observed prefix the specification allows exactly the single action .

(i) Strong enforceability by the set of all FSRAs.
For every there is an FSRA (With states) that outputs on the first steps and outputs 1 on the -th step. Concretely, simply counts up to and emits the described pattern. Hence for each allowed pair there exists an FSRA (namely ) that produces that exact pair. By definition (Def. 2.11) the set of all FSRAs therefore strongly enforces .

(ii) Non-realizability by any set of FSRAs.
Suppose is any set (finite or infinite) of FSRAs and that realizes , i.e. Then in particular every pair belongs to the right–hand side, so for every there exists some with . But realizability as equality forces more: for every and every we must have . Hence for every and every , . Therefore every implements the same global map defined by . In particular, must contain at least one FSRA whose induced function is equal to .

But the function above is not implementable by any FSRA: the equivalence relation has infinite index because distinct prefix lengths are distinguished by future behavior; equivalently is not finite-memory and so (by Theorem 3.13) not implementable by any FSRA. Thus, no FSRA in can induce , a contradiction. Therefore, no set of FSRAs realizes .

Part 2 - Construction of

Again let and . For each set

the set of length- bitstrings that contain exactly ones.

(i) Weak enforceability by a finite set.
Let be the simple 2-state FSRA that ignores the input symbol and alternates outputs (a 2-cycle). For every that controller’s length- output contains exactly ones (the ones appear at the even positions). Hence for every . Therefore, the singleton finite set satisfies the weak enforceability condition: for every observed history there exists a FSRA in this finite set whose produced action at is allowed.

(ii) Failure of strong enforceability by any finite set.
Fix any finite FSRAs set . For a given length the set can produce at most distinct length- action strings (each FSRA gives one string of length ). But the size of the fiber equals the binomial coefficient , which grows super-polynomially in . Thus, for sufficiently large we have . Hence cannot contain FSRAs producing every allowed string in . Since strong enforceability would require that every allowed action at each history be produced by some FSRA in the set (Def. 2.11), no finite can strongly enforce .

Example (FSRA - Remainder mod k of the binary stream)

Setup: Fix integer . Observation Alphabet (bits observed); action alphabet (we let each output symbol be the remainder, for readability).

Define deterministic reactive by: for input , the output at step is the remainder of the integer represented by the prefix (interpreting the bit string in binary, most-significant-bit first) modulo . Equivalently let and define (a string of symbols in ).

A small working example: () Input .

Output

Finite Memory: Consider the relation on prefixes:

Claim: iff the current remainders after and are equal, i.e. , where denotes the integer value of

If then for any continuation the recurrence depends only on and , so . Hence .
Conversely, if , take to be a single symbol which distinguishes the remainders by the recurrence. So , hence .

Therefore, the equivalence classes are exactly the residue classes modulo . So has finite index - finite memory.

FSRA: Define by

(the residue classes).
.
For state and observation :

After reading a prefix whose current remainder is , seeing bit updates the remainder to and the FSRA emits that new remainder as the action for that timestep. This is deterministic, length-preserving, and causal.

For example: , input :

start from .
read 1: emit
read 0: , emit .
read 1: , emit .
Output:

Finite-State Reactive Agent for binary remainder mod 3.

Specification:

where and
For every observation prefix , the unique allowed action history is the sequence of remainders obtained by repeatedly updating the current remainder by for each next bit . Thus

So .

Example (DFA and FSRA - Absence of a substring)

Let which is the collection of all finite strings over with no two consecutive symbols. The empty string We first see a DFA for it, to be specific a minimal DFA (i.e. the smallest (with respect to number of states) DFA which accepts )

States where:
- = “no at the end / start state” (last symbol was or empty),
- = “last symbol was and no forbidden has occurred so far”,
- = dead/trap state (we have seen substring ).
Start state: .
Accepting states: (we accept as long as has not occurred).
Transition function :

Minimal DFA detecting absence of substring "11"

Setup: Observations (each step the agent observes a bit). Actions (at each step the agent emits 1 iff the prefix seen so far is in , else 0). Define the deterministic reactive function by: for input , the -th output symbol equal
So,

Finite Memory: The relation has exactly three equivalence classes, so has finite memory. (We have proven our claim in the appendix)

Define three candidate classes by partitioning all prefixes into three sets:

(include here).
(the dead/trap class).
These three sets are disjoint, and their union is all prefixes.

Note: We are using here and in the proof given in the appendix

Every prefix belongs to one of the three classes , each class is contained in a single -equivalence class, and the three classes are pairwise distinct. Therefore has exactly three equivalence classes (finite index). By Definition 2.8 (finite-memory), has finite memory. As a corollary, any FSRA implementing requires at least 3 states (one per class); the standard construction produces the 3-state FSRA.

FSRA: Construct FSRA directly using the DFA states.

as above.
is the start state.
(same as DFA):
(emit 1 iff the new state after reading the input is accepting):

For example, input .

Start in .
Read 1: new state , emit because 1 alone is in . (Output so far: 1.)
Read 0: new state , emit because Output: 11.
Read 1: new state , emit because . (Output: 111.)

If we had input 11 then at second step:

Start . read 1 → , emit 1.
Read 1 → , emit 0. So, outputs 10 indicating the prefix of length 2 is rejected.

FSRA detecting absence of substring "11"

Specification: The realized specification is the singleton-valued relation

For each observation prefix , the fiber is the unique action history produced by the FSRA after prefix .

Example (DFA and FSRA - Constraint on Count and Ending)

Fix , observation alphabet . Fix indices , integers . Define

(By convention the empty string does not end in and thus, is not in unless you explicitly want to treat it otherwise.) We build a deterministic finite automaton
that keeps two pieces of summary information:

the residue equal to the current count of modulo , and
the last seen symbol , where denotes “empty / no last symbol yet” (useful for the initial state).

Define

. So .
Start state (zero occurrences seen, no last symbol).
Transition given by $𝟙$ $𝟙$ is 1 if and 0 otherwise.)
Accepting set

An example,

Setup: We convert the language into a Boolean behavior, which is a convenient FSRA target. Let Observations: and Actions:

Define by: for any input and index $𝟙$

So is the binary stream indicating, step-by-step, whether the observed prefix satisfies the two constraints (count-of- and prefix ends with

Finite Memory: The classes of are precisely the DFA states
. In particular has at most classes, so is finite memory. Using the similar proof technique like the previous example we can show that index of is .

FSRA: We now give the Finite-State Reactive Agent that implements the indicator function above.

(same as DFA states).
(per-step indicator).
Start .
Transition (state update) is the same as the DFA: $𝟙$
Output function $𝟙$

Take , alphabet . Let , . Let and . So, we require ends with d.

Start . Consider input.

Read : new residue , last symbol b. New state . Since but last symbol , (). Output so far 0.
Read : stays 1, last → state . Not . Output 0. Outputs 00.
Read : , last → → output 0. Outputs 000.
Read : (r=2), last → → output 0. Outputs 0000.
Read : (r=2), last → . Since output 0. Final outputs: 00000.

Now consider appending four more then : continuation .

After appending the residue increases by 4 → new residue .
After final the residue remains , last symbol d, and outputs 1.

Conclusion

Thinking of agents as simple finite-state machines helps clarify what they can and cannot do. It shows which interactive behaviors can be handled using only limited memory, and when more powerful models are truly necessary. This perspective makes the design space clearer: some goals can be achieved with small, structured controllers, while others require fundamentally richer mechanisms.

There are several natural directions to explore. We can study what changes if the agent is allowed to use randomness, make delayed actions, or update its behavior over time through learning. Each of these makes the model more realistic, but also raises new theoretical questions. Important technical problems include: (i) understanding how hard it is to decide whether a specification can be enforced with limited memory, (ii) finding the smallest possible controller that satisfies a given specification, etc.

These directions connect verification, synthesis, and learning, promising a practical theory for building simple, verifiable reactive agents.

Appendix

Deterministic Finite Automata and Transducers

Definition A.1 (Deterministic Finite Automaton - DFA)

A DFA is a tuple , where:

is a finite set of states
is a finite input alphabet
is the transition function (assumed total i.e. its defined for all possible tuples )
is the start state
is the set of accepting states

A DFA defines a language (= set of all finite length string formed using the alphabet in the alphabet set ) by the usual acceptance criterion: a string is accepted if and only if the state reached after reading from belongs to .

Here is a simple example to understand it. We will construct a DFA which that reads a binary string and accepts if the number of 1s in the string is even. (And remains in the final state whenever the number of 1s encountered till now is even.)

Formally, define the automaton as follows:

States: where the state encodes whether the agent has seen an even or odd number of 1s so far.
Alphabet:
Start state:
(before reading anything, the count of 1s is zero, which is even)
Accepting states:
Transition function :
- Reading 0 does nothing to the parity:
- Reading 1 flips the parity:

Given any input string , the DFA starts in and updates its state step-by-step according to . The string is accepted if the final state lies in , i.e., if the total number of 1s in is even.

For example:

Input 1010 → two 1s → ends in → accepted
Input 111 → three 1s → ends in → rejected

You can view this DFA as a minimal agent with memory:

The state is the agent’s internal memory (here just one bit: even vs odd).
The alphabet is the agent’s observation space.
The transition function is the agent’s update rule.

The accepting states encode a goal or success condition. It already captures the core agent loop: observe → update internal state → decide success/failure.

Definition A.2 (Mealy Machine - A Deterministic Finite Transducer)

A Mealy machine is a tuple , where:

is a finite set of states
is a finite input alphabet
is a finite output alphabet
is the (total) transition function
is the output function
is the start state

On input starting from , the machine produces the output: where the sequence of states is defined by .

In a Mealy machine, each transition produces an output.
The output generated at each step depends on both the current state and the current input symbol and is produced synchronously with the consumption of that input symbol.

Thus, a Mealy machine defines a function , that maps every input string to an output string computed incrementally during the run of the machine.

Equivalently, the transition and output functions can be combined into a single function , where

This formulation emphasizes that output is produced on transitions, which is why Mealy machines are also called deterministic finite-state transducers.

Now, let's discuss a small example. We construct a Mealy machine that, on each input symbol:

outputs 0 when it reads 1
outputs 11 when it reads 0

Thus, the machine maps any binary input string to a binary output string in which every 1 appears in pairs, so the total number of 1s in the output is always even.

Define by:

(a single state, the machine is memoryless)
(input alphabet)
(output alphabet)
(stay in on every input)
(output 0 on input 0, output two 1’s on input 1)

Equivalently you can combine and into one with

Notation: Input/output. Mealy Machine to output double 1s for each 1 in input.

Here is an example run:

Input: .
State sequence: (always )
Output (stepwise):
Concatenate to get . Count of 1’s in = = even.

This Mealy machine produces strings in the language

Every string in has an even number of 1’s, but not every binary string with an even number of 1’s is in . For example the string 101 has two 1’s (even) but is not in . So the machine guarantees parity but does not generate all even-1 strings.

If your goal is all even-1 strings, you need a different transduction strategy (for example allowing outputs ε or single 1’s at some steps plus internal bookkeeping so the machine can place 1’s arbitrarily while keeping total parity even). That requires either a larger state space or different output rules.

The Mealy machines defined above allow a single observation to trigger a variable-length burst of output. While this generality is convenient for describing transductions, it is too permissive for modeling agents that must performs a single action at each step in real time.

In an online agent setting, we might want a stricter notion: at each timestep, the agent observes exactly one symbol and immediately emits exactly one action. This synchrony ensures that the agent’s behavior is length-preserving.

To capture this reactive, per-timestep interaction pattern, we now introduce the standard Mealy machine, obtained by restricting the output function to produce a single symbol per input symbol.

Definition A.2′ (Standard Mealy machine) A standard Mealy machine is a tuple where all components are as in Definition A.2 except the output function is (a single output symbol per input symbol). On input , the output is ), hence .

Think of this Mealy machine as a tiny reactive agent:

Observation space is what the agent sees at each timestep.
Action space is what it can emit; here actions can be length-2 strings (so actions may produce multiple primitive outputs in one step).
The internal state is the agent’s memory (here trivial: a single state, so no memory).
The pair is the agent’s policy + state-update rule. On every observation the agent immediately emits output determined by .

In this example the agent’s policy enforces a global constraint (even parity of 1’s) by a local rule (“whenever you see a 1, emit 11”). This shows two useful points for thinking about agents:

Global invariants from local rules: A simple per-step policy can guarantee a nontrivial global property of the interaction history (here: even parity).
Expressivity vs constraints: The policy here is conservative it enforces parity but at the cost of reduced expressivity (not all acceptable-looking outputs are reachable).

Definition A.3 (Moore Machine)

A Moore machine is a deterministic finite-state transducer given by a tuple , where all components are as in the Mealy machine definition except that the output function is , depending only on the current state.

On input , the machine produces output:

where for (with the initial state). Equivalently, some conventions output as the first symbol, either convention yields equivalent expressiveness.

Mealy and Moore machines are equivalent in expressive power: every Mealy machine can be converted to an equivalent Moore machine with at most a linear increase in the number of states, and vice versa.

Recall the Mealy machine from the previous example, which on input 0 outputs 0 and on input 1 outputs 11, thereby ensuring that the total number of 1’s in the output is always even.

We now construct a Moore machine that produces exactly the same input–output behavior.

Formally, define as follows:

States:
Input alphabet:
Output alphabet:
Initial state:
Transition function ():
Output function ():

A Moore Machine which is equivalent to the Mealy Machine in previous example

Thus:

whenever the machine enters state , it outputs 0,
whenever it enters state , it outputs 11.

For example, on input , the state sequence is ,
and the output is .

As in the Mealy case, each input symbol 1 contributes exactly two 1’s to the output, and each 0 contributes none. Hence every output string has an even number of 1’s.

This Moore machine implements the same transduction as the Mealy machine in the previous example, but the output is associated with states rather than transitions. The extra state is needed to remember which output should be produced after reading the current input symbol. This illustrates the general transformation from Mealy to Moore machines: outputs that depend on in a Mealy machine are encoded into states whose labels determine the output in a Moore machine.

Moore machines label states with outputs rather than labeling transitions. The most general Moore transducer can emit a short burst of output when a state is entered, but that extra generality is unnecessary when modeling agents that act once per timestep. To model a reactive, step-by-step agent it’s clearer to assume each entered state produces exactly one action symbol. This synchronous restriction gives the standard Moore machine.

Definition A.3′ (Standard Moore machine) A standard Moore machine is a tuple where all components are as in Definition A.3 except the output function is . On input , the output is .

From an agent viewpoint, Moore machines correspond to agents whose actions depend only on their current internal state, not directly on the most recent observation. The observation updates the state via , and the action is then emitted as a function of that state alone. Compared to Mealy machines, Moore machines enforce a stronger separation between state update and action selection. This distinction mirrors a common design choice in AI systems: whether actions are allowed to depend directly on raw observations, or only on a processed internal representation. The equivalence between Mealy and Moore machines suggests that this is largely a representational choice rather than a fundamental difference in capability.

Remaining Proofs

Proof for non-finite memory of

For each integer let (The prefix consisting of ones), so . We show and are not -equivalent whenever .

WLOG . Let (nonzero). Choose an integer large enough that

(Existence: for fixed nonzero , the equation implies ; this has only finitely many integer solutions, so for sufficiently large the second condition holds.)

Set and take continuation (i.e. more ones). Then

, a perfect square, so the output bit at the final step of (and indeed one of the outputs in the run) is .
, which by choice of is not a square, so the corresponding output bit in that position is .

Thus and differ, so . Because there are infinitely many , the relation has infinitely many equivalence classes. Hence does not have the finite-memory property.

Proof for finite memory of :

If two prefixes lie in the same class, then they are -equivalent. Take in the same class, we show

Case 1 (): If a prefix already contains 11, then every extension also contains 11. Hence for any continuation , every longer prefix of (from the point where 11 first appeared onward) is not in the language, and the same holds for . Concretely, the per-prefix membership bits for and coincide at every position (they may differ on earlier positions before the first 11, but those are identical by assumption of same class, both have already passed the 11 event). Thus .

Case 2 (): Neither nor contains 11, and both end in 0. From that point onward, whether a future prefix is in the language depends only on the continuation ) and the fact that the last seen symbol before is 0. Since both and share this same “local summary” (no 11 seen, last symbol 0), for every continuation the sequence of future per-prefix membership bits after position depends only on and not on the internal details of or . Also the membership bits up to and are the same shape relative to their class. Hence .

Case 3 (): Similar argument: neither contains 11 and both end with 1. Future membership after appending depends only on the fact “no 11 so far and last symbol = 1” and on . Therefore .

So, each class is contained in a single -equivalence class. We now show that for any two distinct classes there exists a continuation that makes the two resulting outputs differ; this proves they are different -equivalence classes.

Distinguish Let and . Consider the one-symbol continuation . Then: contains 11 (since u ended with 1), so the output bit at that step is 0 (the prefix ending at that position is not in the language).
does not contain 11 (since v ended with 0), so the output bit at that step is 1.
Thus and differ at the last position, so .

Distinguish : Let and . Take continuation (a block of zeros) of any positive length. For every extension , since 11 already occurred inside , none of the prefixes at or after the position where 11 occurred will be in the language, in particular, the outputs at and after position are zeros. But for (where v has not seen 11 and ends with 0), all those extended prefixes continue to avoid 11 and so the outputs at these positions are ones. Thus and differ (for instance, at position , so .

Distinguish : Similar: take . For , outputs after are zeros; for and suitable (even suffices), some prefixes of will still avoid 11 and so produce ones. Concretely, keeps no 11 and so has output 1 at that position, while has output 0. So
Hence the three candidate classes are pairwise distinguishable, so they correspond to three distinct -classes.