USPS delivers ibuprofen and vicodin just the same because it doesn't care enough to open and test every bottle of pills. If they did carefully inspect the contents of every package, they still couldn't distinguish other "shipping misuse" — money for your granddaughter's birthday looks the same as payment for a crime.

An AI providing services via an API is in a similar position. Code that checks that your firewall is configured correctly is identical to code that checks someone else's firewall is configured incorrectly.

If you have more context then you can identify misuse more easily. You can get more context by either utilizing your memory, providing higher-level tasks, or by snooping / being around.


Buying duct tape is normal. Buying zip ties is normal. Buying prepaid cell phones is normal. Buying extra large burlap sacks is normal. If someone asks you to buy all four at once then you might flag that as shopping misuse pretty easily. If they buy the three sequentially then spotting it requires a persistent identifier, logging, the will to make the connection, and the will to act on it.

A couple years ago my white dad went to the store to buy beer with the dark-skinned son of a family friend. The cashier refused to sell it because it obviously looked like buying alcohol for a minor. He just went around the corner and asked a different cashier. I suppose this was a failure of logging — the cashier should've taken the credit card before refusing to sell the beer.

High-level tasks

It is easier to tell apart a malicious function from a line of code, a file from a function, a repo from a file, or an app from a repo.

Being around

Hard to know if your boss is a good fella when you take your orders on slips of paper slid under the door. Humans have avoided doing many evil deeds asked of them by witnessing the emotions and other actions of the one asking. A company like Microsoft or Google could have AI that's generally aware of who you are and why you're asking for certain things. This would take a lot of courage and be a privacy nightmare.


We have set up the misuse detection task as an impossible problem. You simply cannot tell if a single query in isolation is malicious or not. Actually detecting misuse will require some reprioritization, reframing, and uncomfortable decisions.


New Comment
3 comments, sorted by Click to highlight new comments since: Today at 9:15 PM

It is easier to tell apart a malicious function from a line of code, a file from a function, a repo from a file, or an app from a repo.

This paragraph does not make sense to me. (Maybe my reading comprehension is not up to the task).


Is the thesis that the same line of code may be malicious or not, depending on its context?

I would say that it is easier to judge the maliciousness of a single line of code than of the whole function, simply because the analysis of the whole function requires much more resources. You can rule out certain classes of threats by inspecting that one line, while remaining ignorant about a much larger set of classes of threats which require a broader context. If your threat model requires you to decide about those other classes of threats, you must expend those additional resources. It is not about something being easier; it's about being able to make the judgement at all.

[EDIT] Or rephrasing, You need to see a certain breadth of context before you can judge if a system is misused according to some definition of misuse. You can do with a narrow context for certain narrow definitions of misuse; but the wider your definition of misuse, the wider context you have to analyze before you can decide.

I should clarify that section. I meant that if you're asked to write a line of code or an app or whatever then it is easier to guess at intent/consequences for the higher level tasks. Another example: the lab manager has a better idea of what's going on than a lab assistant.

Ah, ok. Thank you for clarifying.

New to LessWrong?