If you look at finite state agent, don't you also have to look at the interaction of it with some (diverse) environment? And that's basically how TMs work? So, what's up with that, what makes them finite state in relevant sense.
Thanks for your question. I may be misunderstanding what you’re pointing at, so apologies in advance if I’m replying slightly off target.
In my setup, interaction with the environment is entirely captured by the observation sequence. The agent does not get access to any external writable memory or oracle beyond the current observation symbol and its own finite internal state.
Crucially, the only memory it has is its finite state set Q. The environment may be arbitrarily complex, but all that complexity reaches the agent only through the finite observation alphabet O, symbol by symbol. There is no tape, no unbounded working memory, and no way to encode arbitrary computation internally.
That’s the key difference from a Turing machine. A TM interacting with an environment can still have access to an unbounded tape, so its computational power comes from its memory model, not from whether it is “interactive.” In contrast, an FSRA has bounded internal memory by construction (finite Q), so it can only implement finite-memory transductions (equivalently, functions with finite Myhill–Nerode index, as in Theorem 3.13).
I didn't mean to say "internal" memory. E.g. you can start TM head, which is finite automaton, on empty tape. So, how is it different from starting finite automaton in the world where you can interact with something that might as well be tape, and get TM analogue in combination of agent + world.
So, it has only observations from environment, no action lever to pull? Or the actions are internal, without being relayed to environment?
(I did not read your whole post sorry)
Two quick clarifications that I should state here more explicitly:
- An FSRA bounds the agent’s internal memory: it has a finite state set Q and fixed maps δ,λ. The agent emits an action symbol for each observation symbol; actions are part of the model (alphabet A).
- What I did not assume (unless explicitly stated) is any particular model of the environment’s memory or persistence. If the environment can record the agent’s actions and later reveal them back as observations (i.e. the environment is effectively a persistent writable tape), then an FSRA interacting with that environment can indeed implement behavior that a standalone finite automaton cannot, the joint agent+environment can be (arbitrarily) powerful.
The point of the FSRA model in this post is to isolate finite internal memory as a resource. To decide whether the closed-loop system is finite-state you must also model the environment. Observations do not depend on past agent actions (or the environment does not persist agent writes). Then the agent’s only memory is Q, and Theorem 3.13 applies.
This work was produced during the Dovetail Research Fellowship. I’m grateful to Alex, Alfred, Winter, and Santiago, and to everyone who provided feedback, and to the fellowship community for their discussions.
Note on formatting: There was a rendering issue with display-style LaTeX in the original version of this post, where block equations were not visible after publishing. To avoid confusion, I have replaced those equations with images. Apologies for the inconvenience, and please feel free to comment if anything is unclear.
Abstract
We study AI agents through the lens of finite-state transducers and string-to-string behaviors. Instead of treating automata purely as language recognizers, we model agents as systems that map observation histories to action histories while interacting with an environment online. We introduce Finite-State Reactive Agents (FSRAs) as deterministic, finite-memory, reactive policies, establishing their equivalence to standard Mealy and Moore machines. Our main result (Theorem 3.13) characterizes exactly which behaviors can be implemented with finite internal state: a deterministic behavior is FSRA-implementable if and only if it is length-preserving, prefix-preserving (causal), and has finite memory, & relate FSRAs to standard Mealy and Moore machines and identify when richer computational models are unavoidable. We also introduce specifications (constraints on allowed agent behaviors) and formalize notions of weak enforceability, strong enforceability, and realizability. Using our characterization theorem, we prove strict separations between these notions (Theorem 3.15), demonstrating when finite-state agents can or cannot satisfy given specifications.
Introduction
Deterministic Finite Automata (DFAs) are usually introduced as a basic model of computation: simple machines that read an input symbol by symbol and decide whether to accept or reject a string. In theoretical computer science, they are often treated as a warm-up before more powerful models such as Turing machines.
In this work, we take a different perspective. A DFA can also be viewed as a minimal agent: it maintains a small internal state, receives observations from the environment, updates its state deterministically, and eventually produces an outcome. Despite their simplicity, DFAs already capture the core interaction loop shared by many AI systems: observe → update internal memory → act or evaluate.
This observation motivates a shift from language recognition to agent behavior. We introduce the model of a Finite-State Reactive Agent (FSRA), which generalizes DFAs from passive evaluators to active, online systems that map streams of observations to streams of actions using finite internal memory. FSRAs formalize deterministic, reactive agents whose behavior is causal and prefix-preserving, and whose internal state evolves according to a fixed update rule (time-invariance). They let us ask precise questions about memory, state abstraction, observability, and goal satisfaction without the complications of learning (updating the policy from data), optimization (searching over policies or parameters to maximize an objective), or stochasticity (randomness in the agent's actions or environment). These characterizations let us state simple, checkable conditions for when a specification is enforceable or realizable by a set of finite-state controllers.
Outline. The rest of this post is organized as follows. Section 1 introduces the Finite-State Reactive Agent (FSRA) model and states its equivalence with standard Mealy and Moore machines. Section 2 develops formal properties of agent behaviors: determinism, length-preservation, causality (prefix-preservation), finite memory, reactivity, and time-invariance, modeled as properties of string-to-string functions. It concludes by introducing specifications (constraints on allowed behaviors) and three enforceability notions: weak, strong, and realizability. Section 3 presents our main results: Theorem 3.13 characterizes exactly which behaviors are FSRA-implementable (reactive + finite-memory), Corollary 3.14 shows that such behaviors induce realizable specifications, and Theorem 3.15 proves strict separations between the enforceability notions, demonstrating fundamental gaps between what finite-state agents can weakly satisfy versus fully realize. We conclude with worked examples including binary remainder computation (mod k), substring avoidance (no "11"), and count-with-ending constraints. Standard automata-theoretic background (DFAs, Mealy machines, Moore machines) and deferred proofs are collected in the Appendix.
Finite State Reactive Agent
Mealy machines (Definition A.2 in the Appendix) are deterministic finite-state transducers that produce an output symbol synchronously with each input symbol. This makes them a natural formal model for very simple online agents: the input alphabet corresponds to the agent's observations, the finite state encodes the agent's memory, and the per-step output is the agent's immediate action. To make this perspective explicit and to use language familiar to AI and alignment readers, we adopt the name Finite-State Reactive Agent (FSRA) for a Mealy machine interpreted as a deterministic, finite-memory, online policy. The formal Definition 1.1 below gives the FSRA tuple and the precise causal / prefix-preserving behavior we require.
Definition 1.1 (Finite State Reactive Agent - FSRA)
A Finite-State Reactive Agent (FSRA) is a deterministic, finite-memory, online transducer that maps each input (observation) prefix to an output (action) prefix of the same length i.e. it emits exactly one action symbol for every observation symbol it consumes.
Formally, an FSRA is a tuple , where
The FSRA implements a function
that is length-preserving:
and causal / prefix-preserving: the -th output symbol depends only on the first input symbols, and for any prefix and continuation , is a prefix of .
Concretely, on input starting from , define the state sequence
with given. The produced action sequence is
, so .
Remark (generalization). If you want to allow a variable number of action symbols per observation, simply let . That yields the Mealy/transducer formulation where one transition may emit a finite output string; the FSRA above is the special case that is strictly one action per observation.
Calling this model an FSRA makes the intended interpretation explicit: are observations, are actions, and the pair together specify a deterministic, finite-memory policy. An FSRA captures the simplest kind of online agent (these terms are discussed in detail in the next subsection): it is reactive (length- and prefix-preserving, emitting exactly one action per observation), has finite memory encoded by the finite state set Q, and is deterministic.
This model is useful for reasoning about what can be implemented with finite memory, and for making clear where finite-state agents fail. When one wants to model policies with longer or variable delay between observation and action, or policies that maintain richer beliefs, the FSRA can be generalized (e.g. or by increasing , but ultimately some behaviors require strictly more expressive models (e.g. pushdown automata, Turing Machines, or stochastic/learning agents).
Proposition 1.2 (Equivalence with Mealy / Moore)
Let be a finite observation alphabet set and a finite action alphabet set.
Brief justification. Part 1 is just a renaming of components. For Part 2, the FSRA-to-Moore direction requires splitting each state into copies - one for each output symbol a that can produce, so that the output can be read from the state alone; this can increase the state count up to . The Moore-to-FSRA direction works by defining where is the Moore output function, moving the output from the arrived-at state to the transition that enters it.
Formal Properties of Agent Behaviors
In this section, we define several properties of general observation-to-action behaviors, modeled as string-to-string functions. We then identify which of these properties are satisfied by Finite-State Reactive Agents (FSRAs).
Let and be finite alphabets, representing observations and actions respectively. We consider functions
where denotes the collection of nonempty subsets of . Such a function assigns to each finite observation history a set of admissible action histories.
Later, we will specialize to deterministic behaviors, which are functions of the form
Example (Behavior)
Let . Define a behavior as follows:
In this example, the behavior is nondeterministic: after some observation histories, multiple action histories are permitted.
Intuitively, a function describes everything an agent is allowed to do, rather than what it will do. For each observation history, lists the action histories that are consistent with some specification, constraint, or design goal. Deterministic agents correspond to the special case where exactly one action history is allowed at each observation history.
It is useful to separate behaviors from agents. A behavior specifies which action histories are acceptable after each observation history, without committing to a particular decision rule. An agent, by contrast, must choose a single action at each step. ; see Definition 2.5), whereas a behavior is what the agent actually does. This three-way distinction: specification, behavior, agent, lets us ask precise questions. For example: "Can a simple agent satisfy this specification?" (i.e., does there exist an agent whose behavior stays within the allowed set?). In the formal-methods and control-theory literature, this is called enforceability: a specification is enforceable by a set of agents if at least one agent in that set produces only allowed behaviors. We can also ask the converse: "Can a set of agents collectively realize every behavior the specification allows?" These notions are formalized in Definition 2.11.
We also distinguish both from a specification: a specification P is a constraint on what an agent is allowed to do (formally, a relation
Allowing to be nondeterministic is intentional. Many natural specifications such as safety constraints, permissive policies, or sets of acceptable plans do not prescribe a unique action at every step. Instead, they describe a space of allowed behaviors.
Definition 2.3 (Deterministic)
A function is deterministic if
In this case, we may write and identify with the unique element of the singleton set. Determinism means that given any observation history, the action sequence is uniquely determined. Note that a deterministic behavior effectively is an agent's policy, the distinction between "set of allowed action histories" and "committed choice" collapses when the allowed set is always a singleton. The behavior-vs-agent distinction is most useful in the nondeterministic case, where a behavior permits multiple options and the question becomes which agent (i.e., which deterministic selection) to deploy.
Example (Deterministic)
Let
So, the function returns and alternatively.
Determinism is the simplest way to turn a permissive specification into an actual policy: once you pick a deterministic you no longer have a set of allowed behaviors but a single thing the agent will actually do (to follow the behavior). This is useful when reasoning about concrete guarantees (e.g., “will this agent ever output
bad?”) but remember determinism is a modeling choice, real systems are sometimes nondeterministic (stochastic) by design.Definition 2.4 (Controller and Controller Set)
A controller is any concrete system (a finite automaton, a program, a circuit, a neural network, etc.) that determines a behavior
where is the set of action histories the controller may produce after observation history o. We deliberately leave "concrete system" informal: the only thing that matters formally is the induced behavior .
A controller set is a set of controllers; we write and use the shorthand for the mapping from controllers to their induced behaviors. (No additional structure on is assumed, it is simply the collection of controllers under consideration, e.g., "all FSRAs" or "all 3-state Mealy machines.")
If each is single valued we call the controller deterministic and identify
Example (Controllers)
Let be the set containing two deterministic behavior/functions with: . is a controller set that can produce at least
-
-
Then
depending on which controller is selected.
Definition 2.5 (Specification)
Let and be finite alphabets, and let
be a specification (a relation between observation histories and allowed action histories). For an observation history write the fiber (i.e., the set of all action histories that the specification allows after ):
A specification is not the same as a behavior. A behavior describes what a controller does; a specification describes what a controller is allowed to do. In particular, can contain many action histories, it expresses a constraint ("any of these actions would be acceptable") rather than a commitment ("this is the action I will take"). A single controller typically covers only some of the allowed pairs. The enforceability and realizability notions in Definition 2.11 make this gap precise: they ask whether controllers in a set can, individually or collectively, cover what allows.
Let be a set of controllers; each controller induces a function (behavior)
so that is the set of action sequences controller produces on observation history . Define the relation implemented by as
Observe that .
Think of the controller as the machine you could put on the robot: it has internal structure and a rule for mapping sensory input to motor outputs. The behavior is the abstract trace of possibilities that a machine can generate. Distinguishing these levels is crucial: a specification can permit many possible action traces (multivalued), but only some will be reachable by your finite-state controllers. The phrase “is it enforceable?” asks whether the machines you can actually implement can realize at least one of the allowed traces at each point; “is it realizable?” asks whether the machines collectively capture exactly the specification.
Definition 2.6 (Length Preserving)
A function is length-preserving if
For deterministic this reduces to
Example (Length-preserving)
Let , and define
i.e.
Intuitively, a length-preserving function produces exactly as many output symbols as it consumes input symbols. This models an agent that emits one action per observation (or more generally, a fixed number of actions per observation). It’s an explicit modeling choice: many controllers (especially embedded or real-time ones) naturally fit this model.
Definition 2.7 (Prefix Preserving/Causal)
A function is prefix-preserving (or causal) if
Causality captures the notion of online computation: the output produced after reading a prefix must remain consistent (as a prefix) with the output produced after reading any extension . In other words, the agent commits to its early outputs and cannot retroactively change them when new observations arrive.
Example (Prefix-preserving / causal)
Let , and define by the stepwise rule: output ; on seeing output . . is a prefix of for every .
on seeing
Formally,
Then
Definition 2.8 (Finite Memory)
A deterministic function has the finite-memory property if the equivalence relation defined by
has finite index (i.e., finitely many equivalence classes).
The relation partitions observation prefixes into equivalence classes that cannot be distinguished by any future continuation. If this partition is finite, then the agent effectively needs only finite memory to encode its past: it suffices to remember which equivalence class the history belongs to. This is the string-function analogue of the Myhill-Nerode theorem for languages.
The finite-memory condition is the formal version of “bounded internal state.” It says there are only finitely many distinct ways the future can unfold after different pasts, so keeping one of finitely many summaries is enough. This is exactly why finite automata are a natural model for resource-limited agents.
Example (Finite Memory - Periodic every )
i.e. the symbol emitted at step
Claim. has finite memory; the index of is exactly .
Proof. Define relation (R) on prefixes by
classes under . If , then for every continuation the positions in where outputs equal 1 are the same (they are exactly those indices congruent to ). Hence . Thus has at most classes. Conversely, two prefixes with different lengths modulo produce different future outputs (their next few outputs disagree), so all residue classes are distinct. Therefore has exactly classes, so finite index.
There are at most
Example (Not Finite Memory - Perfect Square Count Predicate)
Let . Define deterministic reactive by the stepwise rule: iff the total number of s seen so far is a perfect square; otherwise emit . the count of 1s is , then on reading a next symbol the output at the time the -th input is processed depends on whether is a square.
at each step emit
Formally, if after reading prefix
Claim. has infinite index (so is not finite memory).
Proof for this claim can be found in appendix
Definition 2.9 (Reactive)
A behavior is reactive if it is length-preserving, and ( is a prefix of ).
prefix-preserving, i.e.
Equivalent characterization for deterministic behavior: A behavior is reactive if and only if
, not just a bounded summary of it. (In particular, "finite memory" is strictly more general than the memoryless/Markov case, where the action depends only on the most recent observation: a finite-memory agent compresses the full history into one of finitely many states, each of which can encode patterns spanning many past observations.)
Reactivity does not imply memory-lessness or finite memory. A reactive behavior may depend on the entire observation history
Reactivity formalizes the intuitive idea that an agent must act in real time. Once an action is taken, it cannot be retracted when new information arrives. This matters for enforceability: a specification may allow some full action sequence in hindsight, yet no reactive agent can realize it because early commitments inevitably block all allowed continuations. Prefix-wise enforceability makes this tension explicit by asking whether an agent can always keep some allowed future open while acting online. We will see these definitions more formally later.
Example (Reactive)
Same as the prefix-preserving example above: maps observations to immediate outputs, so . This is reactive and causal by construction.
Example (Length-preserving but non-reactive behavior)
Let and . Define a deterministic function
where the output string has the same length as . If denotes the empty string then .
This behavior emits the correct number of actions, but it does not act online. The first action symbol depends on whether the entire observation history has even or odd length, which cannot be determined when the first observation arrives. As a result, the agent’s early actions are not commitments it may retroactively “change its mind” once more observations arrive. This violates the core idea of reactivity: acting step by step in real time.
Example (Reactive but not finite memory)
Let , . Define a deterministic reactive by the stepwise rule: if the total number of 1s seen so far is a perfect square, otherwise output . , where depends on the count of 1s in . is reactive (stepwise defined) but not finite-memory, since tracking “is count-of-1s a square?” requires unbounded memory.
at each step output
Formally
This
This example shows reactivity alone doesn’t bound memory cost. The agent behaves online (each step emits one output) yet the rule depends on an unbounded statistic (counts), so no finite-state controller can implement it.
The properties above (length-preserving, prefix-preserving, finite memory) are exactly the ingredients of our main FSRA characterization theorem (Theorem 3.13). We will state and prove it in Section 3. Before that, we introduce the idea of enforceability and realizability, which we will use later when talking about which specifications such agents can satisfy.
Definition 2.10 (Time-invariance)
A family of online behaviors may be specified by a step-indexed family of per-step update/output rules , where each maps a current internal state and the new observation to a next state, and maps the current state and observation to the emitted action. We call the behavior time-invariant (or stationary) if the per-step rules do not depend on ; that is, there exist maps and such that
Time-invariance adds the further restriction that the decision rule doesn’t secretly depend on how many steps have elapsed.
Definition 2.11 (Enforceability and Realizability).
Weak enforceability: We say that specification is weakly enforceable by a set of controllers if
Another way to define would be like this
whenever the specification allows some action at history , at least one controller in can produce an allowed action at . The controller that works may depend on .
Strong enforceability: We say that is strongly enforceable by a set of controllers if
equivalently
every action allowed by the specification at is realized by some controller in (the realizing controller may depend on the action and on ).
Realizability: We say that
i.e. for every there exists some controller with , and conversely every emitted by some controller in belongs to .
Deterministic simplifications. When induces a set of deterministic function i.e. , then we can just write:
equivalently
equivalently
Special case: singleton controller. When is just a singleton set i.e. then the definitions three definitions just simplify to:
Enforceability compares an abstract specification (what counts as allowed action histories after each observation history) to what a set of concrete controllers can actually produce. Weak enforceability asks only that, whenever the spec allows at least one action after a history , some controller in can produce at least one allowed action at (the controller may depend on the history). Strong enforceability demands more: every action that the spec allows at must be produced by some controller in (so the controllers collectively realize every permitted action). Realizability asks that the set exactly matches the specification: the set of all (history, action) pairs realized by controllers in is equal . These notions let us formalize intuitions like “the spec is implementable by simple reactive agents” and to separate cases where a spec is merely compatible with controllers from cases where it is exactly captured by them.
Example (Weak but not Strong)
Let Define the specification
(For every other observation string .)
Let the controller set be the singleton with deterministic behavior given by
Then:
Example (Strongly enforceable but not realizable)
Let Define the specification
Let the controller set be with behaviors
Then
Proposition 2.12 (Relation between different notions of enforceability):
Let be a set of controllers and a specification. Then
Prefix-wise Enforceable: Say denotes the set of all prefixes of strings in . A specification is prefix-wise Enforceable by if
Equivalently: for every observed prefix
Prefix-wise enforceability is an explicitly online notion: it requires that a single controller already commit, at each observation prefix , to some action-prefixes that are extendable to full allowed actions for every possible future continuation that keeps the specification satisfiable. In practice, this is a useful notion when agents must act before observing the entire future, it demands that the agent’s early choices never rule out reaching some allowed final action once future observations arrive.
Characterization of FSRA and Examples
Theorem 3.13 (Characterization of FSRA)
Let be finite alphabets and let be a deterministic function. Then there exists an FSRA with induced transduction iff all of the following hold:
(Equivalently: is implemented by some FSRA properties (1)–(3) hold.)
Proof (Theorem 3.13):
This completes the (⇒) direction. Note also that the FSRA gives explicit fixed maps (so, time-invariance holds for any FSRA)
1. State set. Let be the set of equivalence classes of . By finite-memory this set is finite. Write for the -class of . Let the start state be .
2. Define . For class and observation define This is well-defined: if then , so for every continuation we have ; taking of the form shows , hence .
3. Define . We must define the single action symbol emitted when in class upon reading . Because is length-preserving and prefix-preserving, for every and the string is a prefix of and so equals concatenated with exactly one symbol. Thus, there exists a unique with
Define: i.e. the last symbol of . We must check is well-defined: if then so for every continuation , . In particular for we get , and for we get . Therefore, the unique last symbols of and coincide; thus .
4. The machine. Put . By construction are total maps and , respectively, so is an FSRA (deterministic, finite , one output per input).
5. Correctness . We show by induction on length of that the action string produced by on input equals .
Hence .
Thus (1)–(3) imply existence of an FSRA implementing . The constructed are fixed maps (time-invariant) by construction.
We call this model a Finite-State Reactive Agent because the name describes exactly what the math enforces. It is finite state because the agent’s memory is a fixed, finite set of states (the classes), there is only a bounded amount of past information the agent can store. It is reactive because the agent responds to each observation immediately: for every input symbol it emits exactly one action symbol, and the action at time t depends only on the observations up to t (no lookahead, no delayed planning). And it is an agent because the whole object is a policy-like mapping from observation histories to actions a causal, online decision procedure. This label therefore does more than decorate the definition: it communicates the limits and strengths of the model at a glance (what the agent can implement, and what kinds of behavior definitely require richer models such as stacks, external memory, or unbounded belief states).
Corollary 3.14 (Realizability of deterministic reactive finite-memory specifications by FSRAs)
Suppose be a deterministic behavior that is reactive (i.e. length-preserving and prefix-preserving) and has the finite-memory property. Define the induced specification .
Then there exists a Finite-State Reactive Agent (FSRA) such that
In particular, is realizable by the singleton controller set
Theorem 3.15 (Strict Separation of Realizability and Enforceability for FSRAs)
There exist specifications
Proof.
We treat the two constructions separately.
Part 1: Construction of
Let and . For each write for the length- observation string. Define the length- action , the unique length- bitstring whose single 1 appears at position . Define the specification
Thus, for every observed prefix the specification allows exactly the single action .
(i) Strong enforceability by the set of all FSRAs. there is an FSRA (With states) that outputs on the first steps and outputs 1 on the -th step. Concretely, simply counts up to and emits the described pattern. Hence for each allowed pair there exists an FSRA (namely ) that produces that exact pair. By definition (Def. 2.11) the set of all FSRAs therefore strongly enforces .
For every
(ii) Non-realizability by any set of FSRAs. is any set (finite or infinite) of FSRAs and that realizes , i.e. Then in particular every pair belongs to the right–hand side, so for every there exists some with . But realizability as equality forces more: for every and every we must have . Hence for every and every , . Therefore every implements the same global map defined by . In particular, must contain at least one FSRA whose induced function is equal to .
Suppose
But the function above is not implementable by any FSRA: the equivalence relation has infinite index because distinct prefix lengths are distinguished by future behavior; equivalently is not finite-memory and so (by Theorem 3.13) not implementable by any FSRA. Thus, no FSRA in can induce , a contradiction. Therefore, no set of FSRAs realizes .
Part 2 - Construction of
Again let and . For each set
the set of length- bitstrings that contain exactly ones.
(i) Weak enforceability by a finite set. be the simple 2-state FSRA that ignores the input symbol and alternates outputs (a 2-cycle). For every that controller’s length- output contains exactly ones (the ones appear at the even positions). Hence for every . Therefore, the singleton finite set satisfies the weak enforceability condition: for every observed history there exists a FSRA in this finite set whose produced action at is allowed.
Let
(ii) Failure of strong enforceability by any finite set. . For a given length the set can produce at most distinct length- action strings (each FSRA gives one string of length ). But the size of the fiber equals the binomial coefficient , which grows super-polynomially in . Thus, for sufficiently large we have . Hence cannot contain FSRAs producing every allowed string in . Since strong enforceability would require that every allowed action at each history be produced by some FSRA in the set (Def. 2.11), no finite can strongly enforce .
Fix any finite FSRAs set
Example (FSRA - Remainder mod k of the binary stream)
Setup: Fix integer . Observation Alphabet (bits observed); action alphabet (we let each output symbol be the remainder, for readability).
Define deterministic reactive by: for input , the output at step is the remainder of the integer represented by the prefix (interpreting the bit string in binary, most-significant-bit first) modulo . Equivalently let and define (a string of symbols in ).
A small working example: ( ) Input .
Output
Finite Memory: Consider the relation on prefixes:
Claim: iff the current remainders after and are equal, i.e. , where denotes the integer value of
Therefore, the equivalence classes are exactly the residue classes modulo . So has finite index - finite memory.
FSRA: Define by
After reading a prefix whose current remainder is , seeing bit updates the remainder to and the FSRA emits that new remainder as the action for that timestep. This is deterministic, length-preserving, and causal.
For example: , input :
Output:
Specification:
where and , the unique allowed action history is the sequence of remainders obtained by repeatedly updating the current remainder by for each next bit . Thus
For every observation prefix
So .
Example (DFA and FSRA - Absence of a substring)
Let
Setup: Observations (each step the agent observes a bit). Actions (at each step the agent emits 1 iff the prefix seen so far is in , else 0). Define the deterministic reactive function by: for input , the -th output symbol equal
So,
Finite Memory: The relation has exactly three equivalence classes, so has finite memory. (We have proven our claim in the appendix)
into three sets:
Define three candidate classes by partitioning all prefixes
These three sets are disjoint, and their union is all prefixes.
Note: We are using here and in the proof given in the appendix
, each class is contained in a single -equivalence class, and the three classes are pairwise distinct. Therefore has exactly three equivalence classes (finite index). By Definition 2.8 (finite-memory), has finite memory. As a corollary, any FSRA implementing requires at least 3 states (one per class); the standard construction produces the 3-state FSRA.
Every prefix belongs to one of the three classes
FSRA: Construct FSRA directly using the DFA states.
For example, input .
If we had input 11 then at second step:
Specification: The realized specification is the singleton-valued relation
For each observation prefix , the fiber is the unique action history produced by the FSRA after prefix .
Example (DFA and FSRA - Constraint on Count and Ending)
Fix , observation alphabet . Fix indices , integers . Define
(By convention the empty string does not end in and thus, is not in unless you explicitly want to treat it otherwise.) We build a deterministic finite automaton
that keeps two pieces of summary information:
Define
An example,
Setup: We convert the language into a Boolean behavior, which is a convenient FSRA target. Let Observations: and Actions:
Define by: for any input and index 𝟙
So is the binary stream indicating, step-by-step, whether the observed prefix satisfies the two constraints (count-of- and prefix ends with
Finite Memory: The classes of are precisely the DFA states
. In particular has at most classes, so is finite memory. Using the similar proof technique like the previous example we can show that index of is .
FSRA: We now give the Finite-State Reactive Agent that implements the indicator function above.
Take , alphabet . Let , . Let and . So, we require ends with d.
Start . Consider input .
Now consider appending four more then : continuation .
Conclusion
Thinking of agents as simple finite-state machines helps clarify what they can and cannot do. It shows which interactive behaviors can be handled using only limited memory, and when more powerful models are truly necessary. This perspective makes the design space clearer: some goals can be achieved with small, structured controllers, while others require fundamentally richer mechanisms.
There are several natural directions to explore. We can study what changes if the agent is allowed to use randomness, make delayed actions, or update its behavior over time through learning. Each of these makes the model more realistic, but also raises new theoretical questions. Important technical problems include: (i) understanding how hard it is to decide whether a specification can be enforced with limited memory, (ii) finding the smallest possible controller that satisfies a given specification, etc.
These directions connect verification, synthesis, and learning, promising a practical theory for building simple, verifiable reactive agents.
Appendix
Deterministic Finite Automata and Transducers
Definition A.1 (Deterministic Finite Automaton - DFA)
A DFA is a tuple , where:
A DFA defines a language (= set of all finite length string formed using the alphabet in the alphabet set ) by the usual acceptance criterion: a string is accepted if and only if the state reached after reading from belongs to .
Here is a simple example to understand it. We will construct a DFA which that reads a binary string and accepts if the number of 1s in the string is even. (And remains in the final state whenever the number of 1s encountered till now is even.)
Formally, define the automaton as follows:
(before reading anything, the count of 1s is zero, which is even)
Given any input string , the DFA starts in and updates its state step-by-step according to . The string is accepted if the final state lies in , i.e., if the total number of 1s in is even.
For example:
You can view this DFA as a minimal agent with memory:
The accepting states encode a goal or success condition. It already captures the core agent loop: observe → update internal state → decide success/failure.
Definition A.2 (Mealy Machine - A Deterministic Finite Transducer)
A Mealy machine is a tuple , where:
On input starting from , the machine produces the output: where the sequence of states is defined by .
In a Mealy machine, each transition produces an output.
The output generated at each step depends on both the current state and the current input symbol and is produced synchronously with the consumption of that input symbol.
Thus, a Mealy machine defines a function , that maps every input string to an output string computed incrementally during the run of the machine.
Equivalently, the transition and output functions can be combined into a single function , where
This formulation emphasizes that output is produced on transitions, which is why Mealy machines are also called deterministic finite-state transducers.
Now, let's discuss a small example. We construct a Mealy machine that, on each input symbol:
Thus, the machine maps any binary input string to a binary output string in which every 1 appears in pairs, so the total number of 1s in the output is always even.
Define by:
Equivalently you can combine and into one with
Here is an example run:
Concatenate to get
This Mealy machine produces strings in the language
Every string in
If your goal is all even-1 strings, you need a different transduction strategy (for example allowing outputs ε or single 1’s at some steps plus internal bookkeeping so the machine can place 1’s arbitrarily while keeping total parity even). That requires either a larger state space or different output rules.
The Mealy machines defined above allow a single observation to trigger a variable-length burst of output. While this generality is convenient for describing transductions, it is too permissive for modeling agents that must performs a single action at each step in real time.
In an online agent setting, we might want a stricter notion: at each timestep, the agent observes exactly one symbol and immediately emits exactly one action. This synchrony ensures that the agent’s behavior is length-preserving.
To capture this reactive, per-timestep interaction pattern, we now introduce the standard Mealy machine, obtained by restricting the output function to produce a single symbol per input symbol.
Definition A.2′ (Standard Mealy machine) A standard Mealy machine is a tuple where all components are as in Definition A.2 except the output function is (a single output symbol per input symbol). On input , the output is ), hence .
Think of this Mealy machine as a tiny reactive agent:
In this example the agent’s policy enforces a global constraint (even parity of 1’s) by a local rule (“whenever you see a 1, emit 11”). This shows two useful points for thinking about agents:
Definition A.3 (Moore Machine)
A Moore machine is a deterministic finite-state transducer given by a tuple , where all components are as in the Mealy machine definition except that the output function is , depending only on the current state.
On input , the machine produces output:
where for (with the initial state). Equivalently, some conventions output as the first symbol, either convention yields equivalent expressiveness.
Mealy and Moore machines are equivalent in expressive power: every Mealy machine can be converted to an equivalent Moore machine with at most a linear increase in the number of states, and vice versa.
Recall the Mealy machine from the previous example, which on input 0 outputs 0 and on input 1 outputs 11, thereby ensuring that the total number of 1’s in the output is always even.
We now construct a Moore machine that produces exactly the same input–output behavior.
Formally, define as follows:
Thus:
For example, on input , the state sequence is , .
and the output is
As in the Mealy case, each input symbol 1 contributes exactly two 1’s to the output, and each 0 contributes none. Hence every output string has an even number of 1’s.
This Moore machine implements the same transduction as the Mealy machine in the previous example, but the output is associated with states rather than transitions. The extra state is needed to remember which output should be produced after reading the current input symbol. This illustrates the general transformation from Mealy to Moore machines: outputs that depend on in a Mealy machine are encoded into states whose labels determine the output in a Moore machine.
Moore machines label states with outputs rather than labeling transitions. The most general Moore transducer can emit a short burst of output when a state is entered, but that extra generality is unnecessary when modeling agents that act once per timestep. To model a reactive, step-by-step agent it’s clearer to assume each entered state produces exactly one action symbol. This synchronous restriction gives the standard Moore machine.
Definition A.3′ (Standard Moore machine) A standard Moore machine is a tuple where all components are as in Definition A.3 except the output function is . On input , the output is .
From an agent viewpoint, Moore machines correspond to agents whose actions depend only on their current internal state, not directly on the most recent observation. The observation updates the state via , and the action is then emitted as a function of that state alone. Compared to Mealy machines, Moore machines enforce a stronger separation between state update and action selection. This distinction mirrors a common design choice in AI systems: whether actions are allowed to depend directly on raw observations, or only on a processed internal representation. The equivalence between Mealy and Moore machines suggests that this is largely a representational choice rather than a fundamental difference in capability.
Remaining Proofs
Proof for non-finite memory of
For each integer let (The prefix consisting of ones), so . We show and are not -equivalent whenever .
WLOG . Let (nonzero). Choose an integer large enough that
(Existence: for fixed nonzero
Set and take continuation (i.e. more ones). Then
Thus and differ, so . Because there are infinitely many , the relation has infinitely many equivalence classes. Hence does not have the finite-memory property.
Proof for finite memory of :
If two prefixes lie in the same class, then they are -equivalent. Take in the same class, we show
Case 1 ( ): If a prefix already contains 11, then every extension also contains 11. Hence for any continuation , every longer prefix of (from the point where 11 first appeared onward) is not in the language, and the same holds for . Concretely, the per-prefix membership bits for and coincide at every position (they may differ on earlier positions before the first 11, but those are identical by assumption of same class, both have already passed the 11 event). Thus .
Case 2 (
Case 3 (
So, each class is contained in a single -equivalence class. We now show that for any two distinct classes there exists a continuation that makes the two resulting outputs differ; this proves they are different -equivalence classes.
Let and . Consider the one-symbol continuation . Then: contains 11 (since u ended with 1), so the output bit at that step is 0 (the prefix ending at that position is not in the language).
does not contain 11 (since v ended with 0), so the output bit at that step is 1. and differ at the last position, so .
Distinguish
Thus
Distinguish
Distinguish : Similar: take . For , outputs after are zeros; for and suitable (even suffices), some prefixes of will still avoid 11 and so produce ones. Concretely, keeps no 11 and so has output 1 at that position, while has output 0. So -classes.
Hence the three candidate classes are pairwise distinguishable, so they correspond to three distinct