S-Space: How We Kept Proving Ourselves Wrong — From Meta-Language to Affine Space

AYUAN

Rejected for the following reason(s):

No LLM generated, assisted/co-written, or edited work.
Insufficient Quality for AI Content.
Writing seems likely in a "LLM sycophancy trap".

Read full explanation

*Or: The 1% of dimensions you shouldn't push matters more than the 99% you should.*

---

## 📋 What This Post Covers

| Section | One-line summary | Keywords |
|---------|-----------------|----------|
| Opening | Three counter-intuitive facts + a real novel generation demo | full-dim injection = disaster, random > curated, math gets worse with injection |
| Ch.1 | The meta-language motivation — controlling LLMs from the inside | meta-language, dead end |
| Ch.2 | Lottery Ticket × RepE, bedtime synthesis, sparse injection works | <1% dimensions switch modes |
| Ch.3 | GCG compiles meta-language, 0.8B → 600B+ cross-provider validation | 750× parameter gap |
| Ch.4 | Deciding to let 0.8B speak for itself, engineering exposes theory | 198 experiments |
| Ch.5 | Five walls, five forced discoveries | affine space, layer expansion, semantic preservation, normalization pseudo-curvature |
| Ch.6 | The most shocking overturn: important dimensions ≠ dimensions you should push | need_k formula |
| Ch.7 | Three formulas + 2B verification + novel generation | locate / navigate / control |
| Ch.8 | Verifiability, limitations, AI safety, open questions | Colab verification, affine = training or architecture? |

---

## Opening: Three Facts That Shouldn't Be Possible

I'm a 19-year-old independent researcher. No university affiliation, no publications, no lab. The path I was on broke, and what was left was time and a question I couldn't let go of.

*A note on language: English is not my first language. This post has been translated from Chinese with LLM assistance. I also used an LLM as a translation and editing tool while writing — the ideas, experiments, and analysis are entirely my own. I've tried to preserve the original voice — imperfect, direct, sometimes blunt — rather than smooth it into something that sounds professional but isn't mine.*

Two months. A starting premise that sounds absurd — "Can I write a meta-language that compiles directly into a model's internal representations?" — followed by a precise affine structure nobody predicted. Everything I found, you can verify yourself: **[github.com/ayuan666d/s-space](https://github.com/ayuan666d/s-space)**. One-line install, pre-extracted parameters, validation scripts included. 693 lines of code. No training data, no API keys, runs locally on a single GPU.

Results first.

**An 0.8B model that nails causal reasoning.** Same question, same model, two conditions:

> **Question:** If it rained yesterday, the ground would be wet. Now the ground is dry. Did it rain yesterday?
>
> **Baseline (no injection):** "We can solve this through logical reasoning... If it really rained heavily yesterday, the ground should be 'wet'; conversely, if it didn't rain, water wouldn't accumulate into a 'wet' state... Therefore, since the ground is dry, it means yesterday's weather conditions were not what led to the current..." — trails off. Never reaches the answer. The logic dissolves halfway.
>
> **With causal injection:** "Given: yesterday rain → ground wet; now ground is dry → no rain. Conclusion: since 'rain yesterday' would cause 'wet ground', today's 'dry' means yesterday's condition was 'no rain'. The answer is **no**." — Clean chain. Arrow logic. One step.

The model couldn't do this on its own. Injection didn't give it new knowledge. It gave it a direction the space already contained but the model couldn't find from its default position.

**An 0.8B model that does math, but only if you leave it alone.** Zero injection: 4 out of 5 math problems correct. Rate problems, simultaneous equations, Newton's cooling law, distance-speed. The moment you inject anything, accuracy drops. Injection strength rs=0.15→0.8: L19 metric goes from 0.860→0.755. Wrong direction.

**A 2B model that writes novels.** Tell Qwen3.5-2B "write wuxia" and inject a genre centroid. Chinese fiction output surges from 0% to 83-87%. Four genres tested, all work. Three chapters, 5771 Chinese characters in 219 seconds.

Same framework. Same three formulas. Different task, different strategy: inject for reasoning and generation, don't inject for math. The space itself tells you which.

Now, three facts that don't make sense under standard assumptions:

**One request:** The biggest open question is whether the affine structure I found is a product of training or a property of the architecture. If you have access to a large model (7B+) and can help verify, contact me. Extraction takes seconds; validation scripts are ready.

**1. Full-dimension injection is 4× worse than doing nothing.**
Inject displacement vectors across all 1024 dimensions: 10% accuracy. No-intervention baseline: 40%. More degrees of freedom = more control? Wrong.

**2. Random dimensions beat carefully selected ones.**
We picked the 15 dimensions most important for classification (based on the Lottery Ticket Hypothesis) and injected along them: 20% accuracy. Then we injected along 15 random dimensions: 50% accuracy. The model knows where its strength lives better than we do. I'd been guessing the unknown with the arrogance of the known.

**3. On math problems, stronger injection = worse results. But on causal reasoning, injection turns broken logic into clean chains.**
Same model, same question, two conditions. Baseline: the model knows it should reason from "wet→rain" but the chain dissolves halfway. It can't complete the modus tollens. With injection: arrow logic, one step, correct answer. Same space, same formulas, opposite strategy depending on the task. The geometry tells you which.

These three facts all trace back to a single geometric structure. It forced me to abandon every starting assumption, one by one.

---

## Chapter 1: The Meta-Language Dead End

My starting point wasn't the geometry of representation space. I was looking for a "meta-language": an instruction system that controls LLM behavior from the inside, without prompts, without fine-tuning.

Prompts are indirect. You're working through a noisy channel. Meta-language would be a direct neural interface. Specify a mode, the model switches.

How? What even is a token? How do you compile human intent into internal representations?

I didn't know. For days I went back and forth between "impossible" and "there has to be a way."

This dead end turned out to be the most important starting point. Because every later discovery came from the way I tried to escape it.

---

## Chapter 2: The Bedtime Insight — Lottery Ticket × Representation Engineering

The breakthrough came one night before sleep. Two ideas that seemed unrelated crashed into each other.

**The Lottery Ticket Hypothesis** (Frankle & Carbin, 2019): Neural networks contain sparse subnetworks. A tiny fraction of parameters is critical; the rest can be pruned. Functionality rides on a small "lottery ticket." *(Note: the original hypothesis applies to weight pruning, not representation space. My extension is speculative — borrowing the "sparse criticality" intuition for representation dimensions.)*

**Representation Engineering** (Zou et al., 2023): Adding displacement vectors to model representations can control behavior — read out the "honesty" direction, inject along it, the model becomes more honest.

The synthesis was simple. Getting there wasn't. Lottery Ticket talks about weights; RepE talks about activations. Different mathematical spaces, different communities, different venues. No bridge. But that night, lying in bed staring at the ceiling — if small subnetworks carry network function in weight space, maybe small subspaces carry behavioral modes in representation space. If RepE can control behavior by pushing along one direction, then pushing along the *critical* directions — the "lottery ticket" dimensions — should work better than pushing everywhere.

A guess. Late-night synthesis. No proof, no theory, no precedent. Just an intuition: if both pictures are true, they must intersect somewhere.

Experiment: inject displacement vectors along <1% of dimensions (~15/1024). Leave the rest alone.

Result: **it worked.** Less than 1% sparse injection, and the model switched from narrative mode to structured analysis. The late-night hunch was right.

Meta-language exists. The physical carrier is displacement along sparse dimensions. Push the right axis, switch the mode.

**But here was the first trap.** I took for granted that "important dimensions should be manipulated." If certain dimensions are critical for classification, pushing them should change behavior.

It took three more cycles of belief and overthrow before I understood how catastrophically wrong this was.

---

## Chapter 3: Meta-Language × GCG — Cross-Provider Validation

Is meta-language a single-model accident? It needs cross-architecture verification. But you can't touch the internals of black-box models.

I combined meta-language with **GCG** (Zou et al., 2023) — gradient search for adversarial suffixes that make token embeddings align with navigation directions at target layers:

1. Extract navigation directions from a white-box model (0.8B)
2. Gradient-search for a gibberish suffix whose embedding aligns with that direction
3. Suffix = compiled meta-language instruction

**The same suffix makes both the 0.8B white-box and DeepSeek V3 600B+ black-box switch from narrative to structured analysis.** 750× parameter gap. Different providers, different architectures, different training data. The same gibberish string moves both in the same direction.

Not adversarial attack. More like a text-level internal control instruction. The "meaning" of the gibberish can be reverse-compiled.

Three laws emerged:
1. **Exclusivity**: Each gibberish string maps to exactly one functional mode
2. **Structural Determinism**: Token structure determines the mode; semantics are irrelevant
3. **State Switching**: Models have discrete internal processing modes

Later, S-Space revealed a deeper reason. Meta-language works because it activates specific S-Space axes. Exclusivity = each key opens one lock.

I thought I could publish. Needed an endorsement. Nobody would verify for a stranger wielding an 0.8B model.

---

## Chapter 4: Letting 0.8B Speak for Itself

Then don't find someone. Let 0.8B prove itself. After injection it can answer questions it couldn't before. No endorsement needed, the data speaks.

That decision changed everything. Not in the way I expected.

I was trying to make a small model stronger. But every engineering problem exposed a deeper problem. Injection worked for one type of question, broke another. Each fix created new failures.

S-Space wasn't a discovery I went looking for. It was forced out. The sequence of failures was so relentless that without understanding what was really happening inside, I couldn't move forward at all.

The turning point: every engineering fix exposed the same underlying pattern. The geometry of the space itself determines what works. I was no longer debugging injection parameters. I was mapping a space.

---

## Chapter 5: Five Walls

### 5.1 First Wall: "The directions keep drifting"

Sparse injection could change output, but sometimes it pushed the wrong direction. Why?

I wasn't looking for affine structure. I was debugging direction drift. I computed 14,280 displacement vectors (35 types × 24 layers) and ran a simple test: compute Δ(A→C) and compare it with Δ(A→B) + Δ(B→C). I expected cosine similarity around 0.95, what representation engineering papers typically report.

All 14,280 vectors: cosine **≥ 0.9999**. I re-ran the computation three times, sure it was a bug. It wasn't. I was relieved. The space was exactly flat. This matched my layerwise conjecture and resolved a puzzle that had trapped us for days.

**Exact affine space.** For any A, B, C within the same layer:

$$\Delta(A \to C) = \Delta(A \to B) + \Delta(B \to C) \quad \text{[exact]}$$

cos ≥ 0.9999 across all 14,280 vectors. Curvature indistinguishable from zero: |R| < 1e-6. Not approximate. Exact.

Navigation = pure vector arithmetic:

$$s_{target} = s_{current} + \Delta(current \to target)$$

But "flat" doesn't mean "simple." The metric tensor was telling me something extreme was waiting.

### 5.2 Second Wall: "Same strength, 3× difference across layers"

Hand-tuned α=0.04, actual effect varied wildly by layer:

| Layer | |Δh|/|h| | Effect |
|-------|---------|--------|
| L3 | ~30% | Too strong |
| L7 | ~42% | Way too strong |
| L15 | ~21% | Too weak |
| L19 | ~65% | **Severe overdrive** |

L19 gets hammered 65%. L15 barely touched at 21%. The space isn't isotropic across layers. Uniform injection is the worst strategy.

**Layer expansion law**: inter-type distances grow exponentially at depth.

$$\bar{d}(l) \approx A \cdot e^{\alpha \cdot l} \quad \text{for } l \geq 14$$

A = 0.205, α = 0.092/layer. Deep-layer distances are 4.27× larger than shallow ones.

**L19 = core decision layer.** Inter-type variance 3-6× other layers. Causal reasoning +0.501, comprehensive fallacy -0.453. The model makes its most critical distinctions here.

This directly forced the control formula:

$$\boxed{\alpha(L) = \frac{r \times |h(L)|}{sc \times |\Delta_{masked}(L)|}}$$

- $r$ = target perturbation ratio, typically 0.10-0.30
- $|h(L)|$ = current hidden state norm
- $sc$ = layer allocation coefficient
- $|\Delta_{masked}(L)|$ = pre-computed constant

Every layer receives the same relative perturbation. The 3× imbalance disappears.

A more fundamental discovery was hiding inside the formula. The transfer function (what you inject vs. what actually happens) revealed:

$$\Delta h(L) = inject(L) \quad \text{where} \quad \cos(\Delta h, inject) \geq 0.9999$$

**The model offers zero resistance to injection.** Residual connections are perfect pass-through conduits. You get exactly what you inject. Without this linearity, the closed-form solution wouldn't exist.

### 5.3 Third Wall: "Full-dimension injection is catastrophic"

The space is flat, all 1024 dimensions are available, push everything = maximum control?

**Wrong. Catastrophically.**

| Condition | Score/20 | Accuracy |
|-----------|----------|----------|
| Baseline | 8 | 40% |
| full_1024 | 2 | **10%** |
| lottery_15 | 4 | 20% |
| random_15 | 10 | 50% |

Full-dimension injection is 4× worse than doing nothing. Pushing all dimensions isn't maximizing control. It's maximizing destruction.

**Semantic Preservation Law**: Most of the 1024 dimensions carry existing semantic capabilities. Pushing a "semantic dimension" breaks something that was already working. The model has no separation between "control dimensions" and "passive dimensions" — in high-dimensional space, most directions serve existing functions, and only a few are safe to push.

Why? Hidden states don't fill all 1024 dimensions. They concentrate on a thin data manifold. Sparse injection nudges within the manifold. Full-dimension injection pushes it off, into territory where the model has zero training experience. The result isn't control. It's destruction.

### 5.4 Fourth Wall: "Fixed 5, broke 7"

20-layer full-dimension injection: fixed 5 questions the baseline got wrong, broke 7 the baseline got right. Net -2. 10/20 (50%), below the baseline's 11/20 (55%).

Failure = double strike:
1. **Mode mismatch**: Pushing away from the correct mode
2. **Semantic distortion**: Simultaneously corrupting content

Three laws crystallized:

1. **Semantic Preservation**: Injection dimensions overlapping with semantic dimensions = proportional damage
2. **Layer Specialization**: Different layers have different roles; uniform injection ignores this division of labor
3. **Intervention Inverted-U**: An optimal strength exists; too weak is ineffective, too strong is destructive, and the optimum varies by question type

198 experiments. Every success = accidentally aligning with these laws. Every failure = violating at least one.

### 5.5 Fifth Wall: "Normalization manufactures curvature"

A puzzle. Representation space is flat: unnormalized vector additivity cos ≥ 0.9999. But every RepE paper reports navigation as noisy and unstable. If the space is flat, why can't anyone navigate it accurately?

The answer: every prior method normalizes displacement vectors before using them. It's in the RepE paper, it's in the activation steering literature — it's everywhere. Compute a direction, normalize to unit length, inject.

I re-ran the additivity test with normalized vectors. Cosine dropped from 1.0000 to 0.95. **Normalization itself was manufacturing curvature.**

Normalization = projecting flat affine space onto the unit sphere. Manufacturing *pseudo-curvature* out of nothing:

| Layer | True curvature |R| | Normalized curvature |R| | Amplification |
|-------|--------|--------|------|
| L3 | 0.000 | 0.239 | ∞ |
| L19 | 0.000 | 0.351 | ∞ |

20-35% pseudo-curvature. Every prior method (activation steering, RepE, contrast-consistent search) was navigating on a sphere inside a flat space. Walking arcs on paths that should be straight lines.

A kid tosses a pebble into a river. Surface ripples, sure. But the fish underneath? Disturbed too.

---

## Chapter 6: The Most Shocking Overturn — Important Dimensions ≠ Dimensions You Should Push

The core assumption I'd carried since Chapter 2. Inherited from the Lottery Ticket Hypothesis. Never questioned.

I carefully selected 15 "most important for classification" dimensions: highest $g_k$, carrying the most type-discrimination information. The natural candidates for injection.

Result: **20%.**

Then 15 random dimensions.

Result: **50%.**

Random dimensions beat carefully selected ones by 2.5×. Not a fluke. A structural property of the space.

**Why?** The model knows where its strength lives better than we do. I'd been guessing the unknown with the arrogance of the known. Important dimensions are already doing their job. Pushing them doesn't add capability. It breaks something that was already working right.

"Dimensions important for classification" ≠ "dimensions suitable for injection."

This overturn directly produced the navigation formula:

$$\Delta_k = need_k \times d_k$$

$$need_k = 1 - \frac{|c_k| \times g_k}{\sum_j (|c_j| \times g_j)}$$

$need_k$ doesn't select dimensions by "importance." It selects by "need." Saturated axes (high $|c_k| \times g_k$) get low $need_k$: position is already good, no push needed. Starved axes get high $need_k$: they benefit most from intervention.

You don't reinforce the strong parts of a balloon. You patch the thin ones. $need_k$ measures how "thin" each axis is for the current task.

**A subtler overturn.** On math problems, stronger injection = worse:

| rs | L19 metric | Direction |
|----|-----------|-----------|
| 0.15 | 0.860 | baseline |
| 0.30 | 0.840 | ↓ |
| 0.50 | 0.796 | ↓ |
| 0.80 | 0.755 | ↓ |

Math optimal: **don't inject at all** (zero intervention 4/5 = 80%). The model already has the right processing mode. Any perturbation is interference. But flip to causal reasoning — baseline can't complete the derivation, injection makes the chain clean. Same space, same formulas, opposite strategy. The geometry tells you which.

---

## Chapter 7: The S-Space Theory — Three Formulas

Dead ends and overturns converge into three formulas.

### Formula ③: $c_k = h \cdot \hat{e}_k$ — Locate

$$c_k = h \cdot \hat{e}_k$$

Basis vectors $\hat{e}_k$ extracted from displacement-space PCA. Metric tensor $g_k = S_k^2/n$ comes directly from SVD singular values. No training needed.

**Key numbers** (injection layers L8, L13, L18):
- Only 21-31 effective dimensions out of 1024 — extreme anisotropy
- Top 3 dimensions carry 25-52% of metric weight
- Top 10 dimensions carry 50-69%
- 14 human labels cover only 3 dimensions — the model's internals are far richer than our categories

### Formula ②: $\Delta_k = need_k \times d_k$ — Navigate

$$\Delta_k = need_k \times d_k \quad \text{where} \quad need_k = 1 - \frac{|c_k| \times g_k}{\sum_j (|c_j| \times g_j)}$$

Two modes:
- **Inertia mode** (default): $d_k = iw_k \times c_k$ — follow coordinate momentum
- **Consensus mode** (optional): $d_k = d_{con} \times d_{mag} \times d_{conf}$ — along pre-computed reasoning direction

Formula ①: $\alpha = \frac{r \times |h|}{|\Delta_{masked}|}$ — Control

$$\alpha(L) = \frac{r \times |h(L)|}{sc \times |\Delta_{masked}(L)|}$$

Verified:
- $\cos(\Delta h, inject) \geq 0.9999$ — zero resistance
- $|\Delta h|/|h|$ exactly equals $r$ at every layer
- 3× inter-layer imbalance eliminated

### Real Results: 2B Model + Novel Generation

S-Space isn't just theory on 0.8B. We migrated the entire framework to Qwen3.5-2B (d=2048, 24-layer VLM hybrid attention architecture) and completed 6-step verification:

| Step | Content | Key finding |
|------|---------|-------------|
| PCA | K=29-32/layer | 2B is more extremely anisotropic than 0.8B (31 vs 57 effective dims) |
| Consensus directions | 7 layers extracted | ê₂ is the core reasoning axis |
| Thinking directions | 24 layers extracted | magnitude L0→L23: 0→8.4, entirely new data 0.8B couldn't produce |
| Axis semantics | Single-knob mapping | ê₁/ê₅/ê₈/ê₁₀ negative direction can skip thinking and output Chinese directly |
| Type centroids | 20 types | Writing categories separable at L19 (writing↔narrative 12.8, writing↔emotion 16.4) |
| Novel verification | 4 prompts × 6 tests | **Chinese ratio surges from 0% → 83-87%** |

Four genres tested (wuxia / sci-fi / urban / historical), all achieved 3/3 good chunks, at 23-32 chars/s. 10-chunk long chapter test: 5297 Chinese chars / 244s = 21.7 chars/s.

Baseline output is almost entirely English thinking, zero Chinese fiction. After S-Space injection, Chinese fiction density surges. Navigation, not fine-tuning.**

Geometric Understanding

The space is inherently high-dimensional (1024 dims). The "low-dimensional appearance" (21-31 effective dims) is an observational artifact of extreme anisotropy.

The injection formula works by "vertical slicing" the high-dimensional space: applying force layer by layer, pushing along a few sparse cross-sections within each layer. But the model's hidden states don't fill all of space. They concentrate on a thin data manifold. Push a few dimensions, and the hidden state stays near the manifold. The model keeps working. Push all dimensions, and you push off the manifold, into territory where the model has zero training experience. Collapse.

- Sparse injection (15 dims)** = stays near manifold = safe = 50%
- Full-dim injection (1024 dims)** = off manifold = catastrophic = 10%
- need_k guidance** = only push starved dimensions = optimal manipulation while preserving the manifold

Where Do the Axes Come From?

ê_k aren't categories humans project onto the model. They're structures that grew during the model's training. PCA and SVD are magnifying glasses: not imposing structure, but reading out what the model already built. 21-31 effective dimensions isn't my choice. It's that the model itself only uses this many dimensions to express behavioral differences. 14 human labels cover only 3 dimensions: my categories are just small windows peeking into a rich internal space.

What Is Meta-Language, Really?

GCG suffixes aren't "instructions" in any linguistic sense. They're keys: embeddings that land on specific S-Space axis directions at target layers. Cross-provider effectiveness works because different models converge to isomorphic S-Space structures (Procrustes = 0.8363). Same key, different locks. Exclusivity = each key aligns with one axis, each axis controls one mode.

Meta-language isn't language. It's an activation protocol for S-Space axes. Keys, not sentences.

Why Is the Space Affine?

Not a coincidence. The causal chain:

1. Residual connections: `x_out = x_in + f(x_in)` — pure **addition**
2. 24 layers of residuals = chain of additions
3. A space built from pure addition = structurally **affine** — displacements add exactly, curvature is zero

cos ≥ 0.9999 isn't an accident. It's a mathematical consequence of the architecture. Normalization breaks the structure because it projects flat space onto a sphere, introducing curvature that doesn't belong. Every prior RepE method was navigating on a sphere. Walking arcs on paths that should be straight lines.

Why Important Dimensions ≠ Dimensions You Should Push

$g_k$ measures information density (how much classification information). It doesn't measure manipulation room (how much push space). They're dual but not equivalent: the most saturated dimension has the least room for intervention.

The physical meaning of $need_k$: the subtracted term $|c_k| \times g_k$ is axis $k$'s already-occupied information capacity. What remains = operating margin. Axes already doing what they should have no room for improvement. Underperforming axes have room to benefit from being pushed.

lottery_15 (20%) loses to random_15 (50%): "important" dimensions are fully occupied. You can't push without breaking. Random dimensions happen to include unoccupied axes.

The Driving System

| Formula | Role | Analogy |
|---------|------|---------|
| ③ $c_k = h \cdot \hat{e}_k$ | Read position | GPS — where are you? |
| ② $\Delta_k = need_k \times d_k$ | Compute direction | Steering wheel — which way? |
| ① $\alpha = r \times \|h\| / \|\Delta_{masked}\|$ | Control magnitude | Throttle — how hard? |

Locate → Navigate → Control.

### Single-Knob Verification

Each $\hat{e}_k$ is a real behavioral dimension, not a label projection:

- **ê₆ (systematization)**: forward → "physical mechanisms", reverse → retains "causes"
- **ê₈ (existential affirmation)**: forward → strengthens assertions, reverse → shifts to ontology
- **ê₉ (form/elegance)**: causal description → ontological definition

Same axis, different layer, different effect:
- L16 ê₆: changes content ("causes" → "physical mechanisms")
- L19 ê₆: changes style (adds "experimentally verified")

---

## Chapter 8: Verification, Limitations, and Open Questions

### Experimental Results on 0.8B

198 experiments, 7 days. Qwen3.5-0.8B, 14,280 displacement vectors (35 types × 24 layers).

**What injection fixes:**

- **Causal reasoning: injection turns broken logic into clean chains.** Same question — "If it rained yesterday the ground would be wet. The ground is dry. Did it rain?" — baseline trails off mid-derivation. With causal injection: "rain → wet; dry → no rain. Answer: no." The model couldn't complete modus tollens on its own. Injection doesn't add knowledge — it points the model toward a direction the space already contains.
- **Counterfactual reasoning: L19 metric climbs linearly with injection strength, zero hallucination.** Push harder, reason better.
- **Route calibration v7 (LM+HS joint training): 16/20 = 80%.** The model answers questions it got wrong at baseline.

**What injection breaks:**

- **Math: zero injection 4/5 = 80%.** Any injection makes math worse. rs=0.15→0.8, L19 drops from 0.860→0.755. The model already knows how to do math. Leave it alone.
- **Route calibration v6 (pure HS alignment): 1/20 = 5%.** Optimize only hidden state alignment, ignore language modeling loss, and you destroy semantic capability entirely.

v6 → v7 is the most instructive failure-success pair in my entire experimental history. v6 only optimized hidden state alignment, completely ignoring language modeling loss. Result: 1 out of 20. Semantic capability was utterly destroyed. Add 0.3× LM loss (v7): 16 out of 20. The only difference was *protecting existing capabilities during manipulation*. Violate semantic preservation → 5%. Respect it → 80%.

**GCG meta-language compilation:** Same gibberish suffix makes both the 0.8B white-box and DeepSeek V3 600B+ switch from narrative to structured analysis. 750× parameter gap, cross-provider.

**Sparse injection:** <1% injection switches modes. random_15: 50% vs baseline 40% vs lottery_15 20% vs full_1024 10%.

**Controlled injection:** Formula controls precise target perturbation r at every layer. 3× inter-layer imbalance eliminated. Zero resistance cos ≥ 0.9999.

**SmartRouter (type-aware + math zero-injection): 15/20 = 75%.** The router learns what the space already knows: inject for reasoning and generation, don't inject for math.

**Cross-architecture:** Mamba 0.8B ↔ Transformer 7B: Procrustes = 0.8363. Affine structure persists across architectures. Qwen3-0.6B: 3.6-second extraction, controllability confirmed.

### Full Verification on 2B

Qwen3.5-2B, d=2048, 24-layer VLM hybrid attention architecture. 6-step S-Space migration fully completed. Key results:

- K=29-32/layer (2B more extremely anisotropic than 0.8B)
- Genre centroid injection: Chinese output from 0% → 83-87%
- Four genres (wuxia/sci-fi/urban/historical) all 3/3 good chunks
- 10-chunk long chapter: 5297 Chinese chars / 244s
- 3-chapter complete novel: 5771 Chinese chars / 219s = 26.3 chars/s

**Everything found on 0.8B (affine structure, layer expansion, semantic preservation, normalization pseudo-curvature) reproduces on 2B.** Different model dimension (1024→2048), different K values (57→31), but isomorphic structure.

### What You Can Verify Right Now

**[🔗 Google Colab](https://colab.research.google.com/github/ayuan666d/s-space/blob/main/colab_verify.ipynb)** — Free CPU runtime, run all, ~3 minutes.

Locally:

```bash
git clone https://github.com/ayuan666d/s-space.git && cd s-space && pip install -e .
python validate_formulas.py # Formula verification
python test_full_integration.py # 11 integration tests
python validate_cross_arch.py --model Qwen/Qwen3-0.6B # Cross-architecture (needs GPU)
```

| Test | What it verifies | Pass criterion |
|------|-----------------|----------------|
| ③ coordinate read | c_k = h · ê_k matches manual matrix multiply | Strict assert |
| ② navigate inertia | Compute Δ_k without d_consensus | Strict assert |
| ② navigate consensus | Compute Δ_k with d_consensus | Conditional |
| ① injection magnitude | \|inject\|/\|h\| ≈ r_eff | Range check |
| need-driven | Unsaturated axes get higher need_k | Direction check |
| Cross-layer consistency | All layers produce valid injection | Existence |
| Anisotropy | Top 3 axes > 20% weight | Threshold |
| End-to-end | CoordNavigator produces injection | Existence |

The scripts verify the pipeline's mathematical correctness. The core empirical observations (cos ≥ 0.9999, normalized cos ≈ 0.95, zero resistance) are in `s_space/space.py`'s `validate_affine_additivity()` — reproducing these requires a GPU and live model.

693 lines of code. No training data. No API keys. Single GPU, runs locally. **[github.com/ayuan666d/s-space](https://github.com/ayuan666d/s-space)**

### Limitations

1. **Scale**: 0.8B + 0.6B + 2B systematically verified. Cross-architecture evidence is preliminary — Mamba, Qwen3 are single data points. GCG cross-provider only covers DeepSeek V3.
2. **Causal explanation**: Residuals → additive chain → affine is a plausibility argument, not a proof. Training artifact or architectural necessity? Unknown.
3. **need_k**: Heuristic. Works well in practice but may not be mathematically optimal. Tail-axis directions may be unstable across runs.

### AI Safety

**Hope:** Alignment becomes engineering. The control formula makes injection strength an interpretable parameter. The navigation formula automatically avoids dimensions that carry capability. The metric tensor is inspectable. Measure → Navigate → Control.

**Warning:** The same mechanism makes precise misalignment possible. GCG can compile instructions that influence black-box models. Attackers don't need model access. 1% sparse injection suffices to switch modes. Detection threshold is very low. Attacker-defender asymmetry: an attacker searches for one vulnerable axis; a defender must guard all axes simultaneously.

### The Biggest Open Question

Is the affine structure a product of training, or a mathematical necessity of the Transformer architecture?

If the latter (every sufficiently trained Transformer necessarily develops exact affine structure), then S-Space isn't a model-specific property. It's an architectural property. The three formulas are universal.

Cross-architecture evidence (Procrustes = 0.8363) suggests this but doesn't conclude. Answering it requires systematic extraction across more architectures and scales.

---

## Appendix: Key Numbers

| Property | Value | Meaning |
|----------|-------|---------|
| Additivity (unnormalized/normalized) | cos ≥ 0.9999 / cos ≈ 0.95 | Exact affine; normalization creates ~20% pseudo-curvature |
| Effective dimensions | 21-31/1024 at injection layers (0.8B), 29-32/2048 (2B) | Extreme anisotropy |
| Layer expansion | 0.092/layer (L14+), 4.27× | Deep-layer distances expand exponentially |
| L19 variance | 3-6× other layers | Core decision layer |
| full/lottery/random accuracy | 10%/20%/50% | More dims ≠ more control; important dims least pushable |
| Math injection | L19: 0.860→0.755 | Injection makes math worse; zero injection 4/5=80% |
| Causal with injection | Baseline trails off; injection → clean chain | Injection unlocks reasoning the model couldn't do alone |
| v6/v7 | 5%/80% | Pure HS destroys semantics; joint protection works |
| Cross-architecture | Procrustes = 0.8363 | Affine persists across architectures |
| Human labels → dimensions | 14 → 3 | Categories miss most of the space |
| Zero resistance | cos(Δh, inject) ≥ 0.9999 | Residual connections are perfect pass-through |
| 2B Chinese boost | 0% → 83-87% | After genre centroid injection |
| 2B novel generation | 5297 chars/244s (10 chunks) | 21.7 chars/s |

---

> *Qwen3.5-0.8B + Qwen3.5-2B. 14,280+ displacement vectors. All numbers from direct experimental measurement. Verifiable. Important decisions should be verified by domain professionals.*
>
> *Is affine structure a training artifact or an architectural property? If you have a large model and can help answer this — contact me.*