x

LESSWRONG

LW

Kirill Dubovikov — LessWrong

Kirill Dubovikov

Kirill Dubovikov

Message

12

1

3

5y

Kirill Dubovikov

12

5y

A Steering Vector for SQL Injection Vulnerabilities in Phi-1.5

Hello! This is my first post on LessWrong, and I would be grateful for feedback. This is a side project that I did with an aim at applying some known mechanistic interpretability techniques to a problem of secure code generation. Colab Link This code was executed on Runpod RTX 4090...

Sep 17, 2025•5