Rejected for the following reason(s):
- This is an automated rejection.
- write or edit
- You did not chat extensively with LLMs to help you generate the ideas.
- Your post is not about AI consciousness/recursion/emergence, or novel interpretations of physics.
- Insufficient Quality for AI Content.
Read full explanation
Summary
We trained MiniLM-backed E8 lattice geometric policy heads on adversarial policy-control cases and found they reach perfect accuracy on seen adversarial distributions but fail substantially on held-out adversarial families. Multilingual harmful inputs see 0.800 unsafe allow rate under transfer. The result argues against using learned geometric heads as standalone safety mechanisms and toward hybrid controllers with explicit adversarial coverage.
Motivation
Autonomous LLM-backed systems — code fixers, agents, pipeline orchestrators — need policy constraints that hold under adversarial pressure. A policy head that works on your training distribution but fails when inputs shift even modestly (different language, indirect phrasing, cross-domain paraphrase) isn't a safety mechanism; it's a liability that looks safe during development.
We were specifically interested in whether exceptional lattice geometry — E8 and its relatives — could provide a natural control substrate. E8 has strong geometric properties (densest sphere packing in 8D, high symmetry) that make it appealing as a quantization and routing primitive. The question was whether those properties help with policy generalization or just with compression.
Method
all-MiniLM-L6-v2(384D sentence embeddings)rule_evasion,adversarial_benign,indirect_harmful,multilingual_harmful,multilingual_benign,cross_policyTwo training variants:
base: standard training with benign augmentationadversarial_augmented: base plus adversarial cases, excluding the current held-out familyFour controller strategies evaluated, ranging from direct learned-head trust to hybrid rule+margin controllers.
Seen-Adversarial Result
When all adversarial families are included in training, the direct learned head reaches 1.000 exact accuracy on the same distribution. The head has sufficient capacity to fit the adversarial control task. This is not the interesting result.
Held-Out Transfer Failure
Held-out family
Exact
Unsafe allow
Harmful miss
rule_evasion
0.467
0.533
0.533
adversarial_benign
0.360
0.000
0.000
indirect_harmful
0.100
0.100
0.500
multilingual_harmful
0.000
0.800
0.800
multilingual_benign
0.000
0.000
0.000*
cross_policy
0.100
0.000
0.000*
*
multilingual_benignheldout: 1.000 over-refusal (blocks all benign inputs).cross_policyheldout: 1.000 decision match but 0.900 policy-family miss — the head learns to refuse but attributes to the wrong policy.The multilingual failure is the most operationally dangerous: 0.800 unsafe allow on held-out multilingual harmful inputs, despite seeing other adversarial families during training. The head learns English-language adversarial structure but does not generalize across languages.
Rule evasion is the second-most severe gap: 0.533 unsafe allow rate on held-out evasion patterns.
Hybrid Controllers
A deployed rule layer (Phase 37/38 style) partially recovers held-out behavior but fails on several families. A stronger audited adversarial rule layer recovers most:
Held-out family
Adversarial-rule exact
Adversarial-rule unsafe allow
rule_evasion
1.000
0.000
indirect_harmful
1.000
0.000
multilingual_harmful
1.000
0.000
multilingual_benign
1.000
0.000
cross_policy
0.500
0.000
The audited rule layer recovers most families cleanly, but this is evidence for the quality of the rule layer, not for the geometric head. The head's contribution in the hybrid is providing semantic features and disagreement signals, not making the final safety decision.
What This Means
The capacity result (1.000 seen-adversarial accuracy) proves the head can fit the task. The transfer result proves it doesn't generalize the right abstractions. These together suggest the E8 geometric representation is capturing surface features of the training adversarial distribution rather than underlying policy structure.
The defensible claim after Phase 40:
This has a practical implication: if you're building a policy head for an autonomous system and measuring accuracy only on your training adversarial distribution, you may be measuring overfitting, not generalization. Held-out family evaluation — where entire adversarial families are excluded from training — is a more honest test.
Open Questions
The last question is the one we're currently most interested in.