LESSWRONG
LW

426
Kirill Dubovikov
12130
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
A Steering Vector for SQL Injection Vulnerabilities in Phi-1.5
Kirill Dubovikov1mo20

Thanks! I think it won't work specifically with this model as it was not instruct-finetuned, so it won't follow these instructions clearly. 

But in general prompting should control for SQL injection reasonably well. But I think it's possible to escape prompt-based protection. For example, what if I will inject a jailbreak prompt in a docstring? 

Reply
Generalized Hangriness: A Standard Rationalist Stance Toward Emotions
Kirill Dubovikov4mo40

Indeed, I myself sometimes need to listen to my emotions very carefully to understand that I am doing something that I shouldn't as I am quite skillful in ignoring them. Objectifying your own emotions can only be done if you feel and discern them quite well to begin with.

Reply
So You Think You've Awoken ChatGPT
Kirill Dubovikov4mo51

I tend to agree with this line of reasoning, thanks for your writing. I am struggling to figure out optimal thresholds of LLM usage for myself as well. 

So if LLMs are helping you with ideas, they'll stop being reliable exactly at the point where you try to do anything original.

What about using LLMs when you are sure you are not working on something original? For example, designing or developing software without big novelty factor. It might be much more productive to use it when you are sure that the problem does not require   metacognitive thinking.

Reply
5A Steering Vector for SQL Injection Vulnerabilities in Phi-1.5
2mo
2