A Geometric Account of Activation Steering through Angle–Norm Decomposition

Atmyre; Georgii Aparin

This blog post provides an overview of our recent paper: A Geometric Account of Activation Steering through Angle–Norm Decomposition.

TL;DR: We decompose linear activation steering into two distinct operations: one that changes the angle of the activation toward a concept direction, and one that changes its norm. Through controlled experiments, we analyze the role of each component. We find that concept information is indeed primarily encoded in the angular component of activations. However, the norm also plays an important role, which we interpret as reflecting the effective representational capacity of a token. Based on this, we argue that activation steering should be described by two independent parameters: an angular parameter and a radial parameter, rather than by a single steering-strength coefficient.

Main hypotheses and how we test them

The hypothesis and prior work

Activation steering in LLMs is most commonly implemented as a parallel shift of activations along a precomputed concept vector, often called a steering vector. This design is based on the hypothesis that the manifold of LLM activations is locally linear. However, several recent works have criticized this approach, arguing that linear steering can substantially change the activation norm, pushing activations out of distribution and thereby degrading the mdel.

One of the alternatives is spherical steering, which preserves activation norms and only rotates activations toward the concept vector by some angle (Vu & Nguyen, 2025, You et al., 2026). This idea, together with a few additional tricks, does indeed lead to better steering quality on several benchmarks compared with linear steering and some other approaches.

However, these works did not sufficiently analyze the core hypothesis on which they rely, i.e. that preserving the activation norm exactly is necessary for better steering quality.

Our framework for testing the hypothesis

In our work, we decided to thoroughly test this hypothesis. We proposed a framework that unifies spherical and linear steering within a single class of methods, and separates the norm and angle of an activation vector into two interpretable parameters, instead of using a single, non-interpretable strength parameter as in standard linear steering.

Fig. 1. Summary of steering methods by whether they preserve the original hidden-state norm and whether they enforce a fixed per-token concept score.

Our framework unifies six different steering methods, all of which rotate the activation within the plane spanned by the steering vector and the original activation, as shown in Figure 1. These methods differ along two independent axes.

The first axis determines how the activation is shifted: either linearly, or by a norm-preserving rotation (i.e. spherically). The second axis determines what is kept fixed across tokens: either the shift itself (a vector in the linear case or an angle in the spherical case) or the resulting concept score, meaning the cosine similarity between the steered activation and the concept vector. For example, standard linear steering, or CAA, fixes the shift vector, while spherical steering fixes the concept score. Another method that appears in the literature is linear steering with renormalization, or CAA-r, which preserves the norm but does not fix the concept score.

Since both linear and spherical methods operate in the same plane, nothing prevents us from defining linear steering with a fixed concept score, which we call **CAA-m**, or spherical steering with a fixed angle, which we call **AS**. This gives us a grid of methods, shown in Figure 1, that unifies linear and spherical steering from the perspective of norm preservation and fixed concept-score control.

For our experiments, we used three families of LLMs: Llama, Qwen, and Gemma, with model sizes ranging from 1B to 70B parameters. As concept benchmarks, we used TruthfulQA, SST-2, CivilComments, and IMDB. We evaluated the preservation of generation quality after steering on WikiText and MMLU.

We tested two hypotheses underlying spherical steering:

1. Do activations approximately lie on a sphere?

2. Is concept information encoded in the angle rather than in the norm? This is the assumption behind the idea of "changing the angle without changing the norm."

We also ran two additional experiments: one comparing the steering methods, and another studying the effect of norm preservation during steering.

Experimental results

Sphericality hypothesis

First, we test the hypothesis that the activation manifold is approximately spherical. We computed the coefficient of variation at each layer across a broad set of datasets, as shown in Figure 2. We found that, for Llama and Qwen, activations have a low coefficient of variation only in the final layer. In intermediate layers, the coefficient of variation typically lies around 10–15%. For Gemma, the residual stream activations are not close to spherical at all, due to an architectural feature: post-norms after self-attention and the feed-forward network.

Thus, the sphericality hypothesis was not supported.

Fig. 2. CV of hidden-state norms vs. layer for all 7 models, 10 corpora. Grey dotted = L75 steering layer. Bottom right: combined mean CV across corpora.

Where concept information is encoded

Second, to test where concept information is encoded, we ran a probing experiment. We trained linear classifiers on three types of representations: the original activations, normalized activations, and the scalar activation norm, as shown in Figure 3.

The results are encouraging: none of the concepts can be separated using only the norm, while probing performance on normalized activations is essentially the same as on the original activations. This means that concept information is encoded in the angle, and that changing the norm might not be necessary for effective steering.

Fig. 3. Linear probe accuracy versus layer for all four concept datasets. Each dataset contains three probe variants: raw hidden states h, normalized hidden states h/∥h∥, and norm-only features ∥h∥. Raw and normalized curves nearly overlap, while norm-only probes remain close to chance, indicating that the evaluated concepts are encoded primarily in direction

Comparison of steering methods along two axes

Third, our grid of methods allows us to compare steering methods along each of the two axes independently.

For the concept-score axis, we compared linear steering with a fixed concept score to spherical steering. After steering, the activations produced by these two methods lie on the same ray and differ only in their norm. The result of this comparison, shown in Figure 4, is that spherical steering outperforms linear steering at small concept-score values, but consistently underperforms on generation-quality control datasets.

Fig. 4. Downstream task metric, WikiText-103 perplexity, and MMLU accuracy under S and CAA-m at matched per-token γ. The two methods implement nearly identical angular control, but they differ in radial behavior. At high steering strengths, S incurs much larger perplexity penalties, while CAA-m better preserves generation stability and general capability.

We then compared the remaining three methods: standard linear steering, linear steering renormalized to the sphere, and additive spherical steering. To place all five methods on the same Pareto curve, we calibrated their shift parameters so that the average concept score after steering on the validation set matched the fixed concept-score values from the first comparison.

We found that renormalizing linear steering to the sphere worsens generation quality, while additive spherical steering most often performs best on the steering benchmarks.

In the overall comparison of all five methods, shown in Figure 5, spherical steering performs best, especially at low concept scores. At the same time, linear steering with a fixed concept score is the most stable in terms of generation quality.

Fig. 5. Norm ratio ∥y∥/∥x∥ for CAA-m at matched per-token target γ.

Viewing steering from two-parameter perspective

Fourth, the previous experiment showed that the norm does matter during steering: it affects how well generation quality is preserved. Therefore, we introduced a method that combines angle and norm parameters in a single formula. Instead of one non-interpretable steering-strength hyperparameter, we now have two interpretable parameters: one angular and one radial.

The effect of changing the norm, shown in Figure 6 and Figure 7, is as follows. For the same concept-score values, when steering strength is small, slightly decreasing the original norm tends to improve steering-benchmark performance, although it has little effect on generation quality. In contrast, for large concept-score values, steering with the parameter that maximally increases the norm is more effective in almost all cases, both in terms of concept metrics and generation quality.

Fig. 6. Effect of norm scaling in SN. The left panel shows downstream task metric change, and the right panel shows perplexity ratio. Increasing β has little effect on the semantic task metric but substantially reduces perplexity at high γ, indicating that the norm primarily controls generation stability.

Fig. 7. Fraction of folds in which each β value achieves the best perplexity or task metric. At γ = 0.7, β = 1.2 achieves the lowest perplexity in all folds in our evaluation, indicating that strict norm preservation is not always the most stable choice for high-strength spherical steering.

Takeaways

The main conclusion of our study is that steering should not be described by a single coefficient that simultaneously changes both angle and norm. Instead, it should be described by two independent parameters: an angular parameter and a radial parameter.

Concept information is indeed almost entirely encoded in the direction of the activation, which supports the motivation behind spherical methods. However, the norm is not semantically neutral: under strong steering, strictly preserving the norm noticeably harms generation quality.

One possible interpretation is that the hidden-state norm is related to the effective representational capacity of a token. Under a strong angular intervention, the model needs to store both the steered concept information and the remaining context somewhere. Increasing the norm may provide exactly this additional capacity.

8