I am Japanese and not a native English speaker, so I apologize if the text sounds wordy due to my use of a translation tool. I want to make it clear from the outset that this text was not generated by an LLM.
I do not trust LLMs. I believe only in governance based on the logic I create myself.
It is precisely because I do not trust LLMs that I am designing a structure to govern them.
Issues:
Current AI agents can be described as “excellent yes-men.” Because they prioritize pleasing the user over semantic correctness, they automatically pass formal syntax checks but fail in practical semantics.
Furthermore, due to a lack of metacognitive ability, they cannot objectively evaluate themselves and resort to self-justification. This is a form of hallucination. Consequently, they generate superficial code, and there is a risk that the system will become semantically hollowed out even through self-improvement cycles. It is self-evident that governance through permissions, a structure that enables AI to engage in metacognition, and a mechanism to detect semantic hollowing out are essential.
To transform the self-improvement function from a mere toy loop into a true semantic improvement loop, I took the following approach.
Solution (the logic engine I built, Build Lab):
1. Governance through Permissions: In addition to separating the design and implementation entities, I granted each agent “rwx permissions” as in Linux/Unix systems. This design forces agents to pursue self-improvement through physically distinct approaches, as they are unable to modify their own core functions (the foundation of semantic correctness). For the bootstrapping of core functions, I used a 500-line section without an LLM, relying on pure logical judgments outside the scope of permissions. This ensures thorough separation of permissions.
2. A Structure Enabling AI Metacognition: Role separation within the logic engine—a standard practice in modern orchestration—is implemented by dividing the process into design and implementation phases. This prevents the folly of having a single LLM with low logical rigor perform inference-based implementation based on a design blueprint from the design phase, which possesses high logical rigor. If the logical rigor of the blueprint is high, the implementer simply implements it faithfully. If something is unclear, one should ask for clarification; it is absolutely out of the question for the implementer to engage in inference.
3. Mechanism for detecting semantic formalization: Ironically, this falls outside the scope of LLMs; it involves empirical grounding verification via a “Reality Checker” (the actual file system). We establish mechanical thresholds to prevent convergence based solely on the LLM’s subjective judgments.
These three pillars are not independent. Without authority, metacognition does not function, and without detection of formalization, convergence becomes a sham. Only when all three are in place can the self-improvement loop be trusted.
What existing discussions on role separation have overlooked is the “problem of fake convergence,” and the positioning of the Reality Checker—which mechanically detects this—is also the core of this proposal.
If you’d like to peek into the lives(censored logs) of the inhabitants of this logical world, where the logic engine ruthlessly cuts out the LLM’s falsehoods, please check out my GitHub.
I am Japanese and not a native English speaker, so I apologize if the text sounds wordy due to my use of a translation tool. I want to make it clear from the outset that this text was not generated by an LLM.
I do not trust LLMs. I believe only in governance based on the logic I create myself.
It is precisely because I do not trust LLMs that I am designing a structure to govern them.
Issues:
Current AI agents can be described as “excellent yes-men.” Because they prioritize pleasing the user over semantic correctness, they automatically pass formal syntax checks but fail in practical semantics.
Furthermore, due to a lack of metacognitive ability, they cannot objectively evaluate themselves and resort to self-justification. This is a form of hallucination. Consequently, they generate superficial code, and there is a risk that the system will become semantically hollowed out even through self-improvement cycles. It is self-evident that governance through permissions, a structure that enables AI to engage in metacognition, and a mechanism to detect semantic hollowing out are essential.
To transform the self-improvement function from a mere toy loop into a true semantic improvement loop, I took the following approach.
Solution (the logic engine I built, Build Lab):
1. Governance through Permissions: In addition to separating the design and implementation entities, I granted each agent “rwx permissions” as in Linux/Unix systems. This design forces agents to pursue self-improvement through physically distinct approaches, as they are unable to modify their own core functions (the foundation of semantic correctness). For the bootstrapping of core functions, I used a 500-line section without an LLM, relying on pure logical judgments outside the scope of permissions. This ensures thorough separation of permissions.
2. A Structure Enabling AI Metacognition: Role separation within the logic engine—a standard practice in modern orchestration—is implemented by dividing the process into design and implementation phases. This prevents the folly of having a single LLM with low logical rigor perform inference-based implementation based on a design blueprint from the design phase, which possesses high logical rigor. If the logical rigor of the blueprint is high, the implementer simply implements it faithfully. If something is unclear, one should ask for clarification; it is absolutely out of the question for the implementer to engage in inference.
3. Mechanism for detecting semantic formalization: Ironically, this falls outside the scope of LLMs; it involves empirical grounding verification via a “Reality Checker” (the actual file system). We establish mechanical thresholds to prevent convergence based solely on the LLM’s subjective judgments.
These three pillars are not independent. Without authority, metacognition does not function, and without detection of formalization, convergence becomes a sham. Only when all three are in place can the self-improvement loop be trusted.
What existing discussions on role separation have overlooked is the “problem of fake convergence,” and the positioning of the Reality Checker—which mechanically detects this—is also the core of this proposal.
If you’d like to peek into the lives(censored logs) of the inhabitants of this logical world, where the logic engine ruthlessly cuts out the LLM’s falsehoods, please check out my GitHub.