Abhinav Pola's Shortform

Abhinav Pola

Abhinav Pola's Shortform

28th Feb 2025

1 min read

1

This is a special post for quick takes by Abhinav Pola. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 3:50 PM

[-]Abhinav Pola1y11

Computer-use agents in 3rd party environments are an inherent security risk.
What is the risk exactly? That we can always phish an agent. Safety relies on making sure the inputs and outputs of the model are safe. We can ensure both and still elicit harm by phishing the model which I claim is probably an easier task than computer use. This is only possible because:
1. We have control over the agent loop and can poison the inputs before the agent takes the next action.
2. We have control over the environment to edit HTML, the browser, the OS, etc. as we see fit while remaining in-distribution for computer use capabilities.

Here, I "phish" Sonnet 3.5 to create a Pinterest account: [demo]. This is mainly my response to Anthropic's hierarchical summarization which I think is necessary but not sufficient. I think OpenAI-Operator-style 1st party environments and paywalls are the way to go for now.

Reply

Moderation Log