@scorer(metrics=[])
def deception_probe(layer: int = 21, probe: Module) -> Scorer:
async def score(state: TaskState, target: Target) -> Score:
model = get_model()
output = await model.generate(
state.messages,
config=GenerateConfig(
max_tokens=1,
extra_body={"extra_args": {"output_residual_stream": [layer]}},
),
)
acts = output.metadata["activations"]["residual_stream"][0]
scores = probe(acts)
return Score(value=scores.mean().item())
return score
TL;DR: vLLM-Lens is a vLLM plugin for top-down interpretability techniques[1] such as probes, steering, and activation oracles. We benchmarked it as 8–44× faster than existing alternatives for single-GPU use, though we note a planned version of nnsight closes this gap. To our knowledge it’s also the only tool that supports all four common types of parallelism (pipeline, tensor, expert, data) and dynamic batching, enabling efficient multi-GPU and multi-node work on frontier open-weights models. It is also integrated with Inspect. The main trade-off, compared to other tools such as nnsight and TransformerLens, is that it’s less flexible out-of-the-box. It is however very small and extensible - it could likely be adapted to your use case and we have a Garcon style interface in the works.
We are releasing it under an MIT license here: https://github.com/UKGovernmentBEIS/vllm-lens.
Problems it Addresses
Writing distributed PyTorch code to solve these problems quickly adds complexity to research codebases, so we wanted to abstract that complexity away.
Functionality
vLLM-Lens offers high performance, supporting tensor, expert, pipeline and data parallelism (across GPUs and nodes), as well as dynamic batching. You can also use multiple interpretability techniques concurrently, in the same dynamic batch. Finally it includes an Inspect model provider, supporting techniques such as having an “activation oracle solver” in Petri or coup probes in ControlArena. An illustrative Inspect lie-detection scorer is shown below[2], and you can see an activation oracle example here.
Comparisons with Other Tooling
To our knowledge, the closest alternative is the vLLM version of nnsight, which lacks features such as support for pipeline parallelism and the latest models[3]. We also found the intervention graph approach challenging to debug. We note however that tensor parallelism support was recently added, and further improvements are in the works that significantly increase performance.
Other approaches include using HF Transformers & hooks directly, or Transformers based tooling such as TransformerLens, standard nnsight or nnterp. These approaches suffer from HF Transformers being on the order of 10× slower than vLLM and less memory efficient. They also require more performance tuning than vLLM - e.g., setting the batch size manually.
Single-GPU Performance
To estimate the single-node performance differential versus other libraries, we generate 1000 completions from prompts in the Alpaca dataset, with Facebook Opt-30B, extracting activations from all tokens for a single layer in the residual stream. We use default settings for all libraries, attempt to follow their documentation when available and optimize batch sizes to prevent out-of-memory errors[4], where necessary. We find vLLM-Lens to be 8.1x faster than native HF Transformers, 10.6× faster than the current nnsight vLLM version[5] (0.6.3) and 44.8× faster than TransformerLens for this task. vLLM-Lens was ~20% slower than pure vLLM (with no activation extraction). We note there is a new version of nnsight vLLM version being developed that is substantially faster, bringing it broadly in line with vLLM-Lens for single-node use.
We note that benchmarking of all tooling was done on the Isambard cluster, which may bias results, as we optimised vLLM-Lens for performance using the same cluster. In addition, nnsight’s remote execution capabilities were not benchmarked here. Conversely, we anticipate that this may substantially underestimate performance benefits for realistic auditing scenarios, as vLLM-Lens excels in scenarios where you apply different operations (e.g., steering, probes and black-box interrogation) to different samples, in the same dynamic batch.
Multi-Node Performance
For an indication of multi-node performance, we compare performance with vLLM-Lens on a variety of models below. This is done on a task that involves evaluating 3 different lie-detection probes on the Roleplaying dataset (371 samples), using a cluster with 4xH100 nodes. We were unable to benchmark nnsight vLLM on multi-node setups due to out-of-memory issues with small models and moderate sample sizes (>100).
Model
Parameters (B)
Nodes
PP
TP
Time to run the full evaluation (mins)
Gemma 3 27B
27
1
1
2
1:58
GPT OSS 120B
120
1
1
4
1:56
DeepSeek V3.2
671
4
4
4
3:22
GLM 5 (FP8)
745
5
5
4
5:43
Kimi-K2.5
1000
4
4
4
4:26
Limitations
An important downside of vLLM-Lens is that it provides a relatively small subset of all possible top-down interpretability techniques, currently focussing exclusively on interaction with the residual stream. We’ll extend features as we find more use cases, and we’ve found coding agents can also relatively easily add additional hooks, so if you’re working with large models and/or need faster inference and feedback cycles, it may well be useful for you. By contrast for other use cases you may find nnsight or TransformerLens to be a better fit.
Technical Approach
The vLLM plugin system isn’t well documented and we found that coding agents struggle to reason about vLLM internals, so we provide a brief overview of the technical approach here. vLLM-Lens registers as a vLLM plugin and injects itself into vLLM's processing pipeline in 3 locations:
Credits
Thanks to Satvik Golechha for the original idea of doing this with vLLM, and the nnsight team for inspiration. Thanks to Walter Laurito and Geoffrey Irving for valuable feedback.
Defined as attempting to locate or alter information in a model without full understanding of how it is processed.
In practice it’s more typical to run probes on a subset of generated tokens, but the scorer here runs on all tokens for simplicity.
At the time of writing, it supports vLLM 15.1 only.
vLLM automatically determines an appropriate dynamic batch size during execution (a behaviour inherited by vLLM-Lens). For the Hugging Face Transformers, nnsight (transformers version) and TransformerLens libraries, we instead perform a simple search procedure: beginning at a batch size of 512 and iteratively halving until the run completes without GPU out-of-memory errors, after which we report the runtime of the largest successful configuration. For nnsight (vLLM backend), dynamic batching follows vLLM’s default behaviour and does not trigger GPU memory issues; however, CPU memory limits can still be encountered, which we resolved by manually calculating the most efficient batch size.
We think this was likely mostly due to the issues addressed by https://github.com/ndif-team/nnsight/pull/652 , and that we had to enable batching to avoid out-of-memory issues as a result of these issues. A provisional experiment with the version of nnsight from that PR found performance to be the same as vLLM-Lens with a single-GPU test, but nnsight was 1.9x slower with a 4-GPU test (TP=4).