A Steering Vector for SQL Injection Vulnerabilities in Phi-1.5
Hello! This is my first post on LessWrong, and I would be grateful for feedback. This is a side project that I did with an aim at applying some known mechanistic interpretability techniques to a problem of secure code generation. Colab Link This code was executed on Runpod RTX 4090...
Sep 17, 20255