Kirill Dubovikov — LessWrong

A Steering Vector for SQL Injection Vulnerabilities in Phi-1.5

Thanks! I think it won't work specifically with this model as it was not instruct-finetuned, so it won't follow these instructions clearly.

But in general prompting should control for SQL injection reasonably well. But I think it's possible to escape prompt-based protection. For example, what if I will inject a jailbreak prompt in a docstring?

Generalized Hangriness: A Standard Rationalist Stance Toward Emotions

Kirill Dubovikov4mo40

Indeed, I myself sometimes need to listen to my emotions very carefully to understand that I am doing something that I shouldn't as I am quite skillful in ignoring them. Objectifying your own emotions can only be done if you feel and discern them quite well to begin with.

So You Think You've Awoken ChatGPT

Kirill Dubovikov4mo51

I tend to agree with this line of reasoning, thanks for your writing. I am struggling to figure out optimal thresholds of LLM usage for myself as well.

So if LLMs are helping you with ideas, they'll stop being reliable exactly at the point where you try to do anything original.

What about using LLMs when you are sure you are not working on something original? For example, designing or developing software without big novelty factor. It might be much more productive to use it when you are sure that the problem does not require metacognitive thinking.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments