Getting the average hidden state from each latent vector and using the difference between latent vector A and B to steer the hidden states.
Since codi uses the kv values on eot token. To get new kv values that contain the info from the steered vector I needed to steer latent 1 -> run codi for one additional latent and then get the kv values of latent 2 and see the output.
KV cache Steering
Steering the KV cache and adding the steered KV cache directly onto the codi model. Directly adding average difference in kv values to past_key_values.
Experiments
PCA Logit Lens
With the hidden states with PCA direction 1 there seems to be a clear <|eocot|> direction in the CoDI model which is interesting as the <|eocot|> token is steered with the kv cache to output the response for latent reasoning.
Looking at the PCA for KV cache they do not look like there is interpretable directions all of the directions seem to have similar variances
PCA Hidden State Activations
When looking at the PCA activations for the hidden states there seems to be a clear diagonal separation for even and odd steps for PCA 1 and PCA2 for hidden state activations during latent reasoning. This matches up with previous post results where there was a clear distinction between even and odd latents supporting the conclusion from the Scratchpad Thinking paper.
PCA KV cache
The PCA directions do not seem to have the different latent steps cluster like with hidden state and looks random
PCA Steering
PCA steering seems relatively uniform for activation and kv value PCA directions
Activation Steering with hidden state requires a forward pass due to the fact CoDI only keeps the kv values. So to make a fair comparison of PCA steering I ran kv steering through a forward pass and found KV steering matched a random vector after forward pass.
Critique of CoDI
Note: this section is opinionated. The claims below are my interpretation and speculation, not established findings — treat them as hypotheses worth testing rather than conclusions.
CoDI works by running through n latent forward passes then only keeps the kv_cache which is used to steer the <|eocot|> token CoDI removes the hidden state computed when outputting the answer with latent reasoning. CoDI acts like a goldfish in the sense that after latent reasoning it forgets what happens during latent reasoning which might make CoDI not scale. After 6 latent reasoning steps and outputting a token the kv values are not saved for the generation of future tokens.
This makes it not possible to steer codi traditionally with hidden states as for the <|eocot|> token only the kv values are kept. This makes hidden state steering require another latent reasoning pass to get updated kv values to use to generate the answer.
The forward pass step for hidden state activation steering explains why for the later layers the random vector did not change the accuracy for CoDI.
KV values can be steered without an additional forward pass so they can be used to meaningfully change the accuracy of the model to be better after steering. However, after the forward pass the performance more closely matches the Random vector.
With how the latent forward passes have all their data stored inside the kv cache the limitations of how many steps that can be done with latent reasoning might be a limitation of the kv cache.
The kv cache might be similar to how tokenizing the step at each step forces the hidden state to be one token. But, instead it is forcing n steps to store their information on the kv values. This could explain why after latent step 5 the accuracy seems to start decreasing as the kv cache is saturated and can’t store more info
Due to how kv cache does latent reasoning is not saved unlike normal cot which has the previous tokens listed out so the model in future tokens is able to refer to past tokens.
Future Work
Create a different version of latent reasoning that is not CoDI
Attempt to create a CoDI variant that can take in hidden state values so a lot of the information is simply not lost during the latent reasoning generation.
In my previous post I found that activation steering worked with KV_cache and not with hidden state steering.
So I decided to look at the PCA with methods such as logit lens and activation steering
Quick Summary:
Experimental setup
CoDI model
I use the publicly available CODI Llama 3.2 1B checkpoint from Can we interpret latent reasoning using current mechanistic interpretability tools?
Tuned Logit Lens
To create my tuned logit lens implementation I used the code implementation for the training of Tuned logit lens from Eliciting Latent Predictions from Transformers with the Tuned Lens
Activation Steering
Getting the average hidden state from each latent vector and using the difference between latent vector A and B to steer the hidden states.
Since codi uses the kv values on eot token. To get new kv values that contain the info from the steered vector I needed to steer latent 1 -> run codi for one additional latent and then get the kv values of latent 2 and see the output.
Steering the KV cache and adding the steered KV cache directly onto the codi model. Directly adding average difference in kv values to past_key_values.
Experiments
PCA Logit Lens
PCA Hidden State Activations
PCA KV cache
PCA Steering
Critique of CoDI
Note: this section is opinionated. The claims below are my interpretation and speculation, not established findings — treat them as hypotheses worth testing rather than conclusions.
The following critique draws on findings from this sprint's PCA analysis alongside Can we interpret latent reasoning using current mechanistic interpretability tools?, the Scratchpad Thinking paper, and my previous lesswrong post sprints
Future Work