GPT Agent Is Standing By

Zvi

OpenAI now offers 400 shots of ‘agent mode’ per month to Pro subscribers.

This incorporates and builds upon OpenAI’s Operator. Does that give us much progress? Can it do the thing on a level that makes it useful?

So far, it does seem like a substantial upgrade, but we still don’t see much to do with it.

What Is The Thing?

Greg Brockman (OpenAI): When we founded OpenAI (10 years ago!!), one of our goals was to create an agent that could use a computer the same way as a human — with keyboard, mouse, and screen pixels.

ChatGPT Agent is a big step towards that vision, and bringing its benefits to the world thoughtfully.

ChatGPT Agent: our first AI with access to a text browser, a visual browser, and a terminal.

Rolling out in ChatGPT Pro, Plus, and Team [July 17].

OpenAI: t the core of this new capability is a unified agentic system. It brings together three strengths of earlier breakthroughs: Operator’s⁠ ability to interact with websites, deep research’s⁠ skill in synthesizing information, and ChatGPT’s intelligence and conversational fluency.

The main claimed innovation is unifying Deep Research, Operator and ‘ChatGPT’ which might refer to o3 or to GPT-4o or both, plus they claim to have added unspecified additional tools. One key tool is it claims to be able to use connectors for apps like Gmail and GitHub.

As always with agents, one first asks, what do they think you will do with it?

What’s the pitch?

OpenAI: ChatGPT can now do work for you using its own computer, handling complex tasks from start to finish.

You can now ask ChatGPT to handle requests like “look at my calendar and brief me on upcoming client meetings based on recent news,” “plan and buy ingredients to make Japanese breakfast for four,” and “analyze three competitors and create a slide deck.” ChatGPT will intelligently navigate websites, filter results, prompt you to log in securely when needed, run code, conduct analysis, and even deliver editable slideshows and spreadsheets that summarize its findings.

Okay, but what do you actually do with that? What are the things the agent does better than alternatives, and which the agent does well enough to be worth doing?

Can It Do The Thing?

Tejal Patwardhan (OpenAI): these results were eye-opening for me… chatgpt agent performed better than i expected on some pretty realistic investment banking tasks.

In particular, models are getting quite good at spreadsheets and slide decks.

That’s definitely a cool result and it helps us understand where Agent is useful. These are standardized tasks with a clear correct procedure that requires many steps and has various details to get right.

They also claim other strong results when given its full toolset, like 41.6% on Humanity’s Last Exam, 27.4% on FrontierMath (likely mainly due to web search?), 45.5% (still well below 71.3% for humans) on SpreadsheetBench, 68.9% on BrowseComp Agentic Browsing (versus 50% for o3 and 51.5% for OpenAI Deep Research) and various other measures of work where GPTAgent scored higher.

A more basic thing to do: Timothy Lee orders a replacement lightbulb from Amazon based on a picture, after giving final approval as per usual.

Access Granted

Access, both having too little and also having too much, is one of the more annoying practical barriers for agents running in a distinct browser. For now, the primary problem to worry about is having too little, or not retaining access across sessions.

Alex West: Played with OpenAI Agent Mode last night.

Tasks I couldn’t do before because GPT was blocked by not being a human or contained in its sandbox, I can now do.

The only downside is I need to remember all my own passwords again!

The first time I logged in I needed to remember and manually enter my password. It then validated it like a new device and verified in my gmail and also hit my 2FA by phone.

The next time I used the agent, minutes later, it remained logged. Will see if that times out. Almost an hour later and it seems like I’m still logged into LinkedIn.

And no problem getting into Google Calendar by opening a new tab either.

Alex West: ChatGPT Agent can access sites protected by Cloudflare, in general.

However, Cloudflare can be set to block more sensitive areas, like account creation or sign-in.

Similarly, I understand they have a principle of not solving CAPTCHAs.

Access will always be an issue, since you don’t want to give full access but there are a lot of things you cannot do without it. We also have the same problem with human assistants.

Amanda Askell: Whenever I looked into having a personal assistant, it struck me how few of our existing structures support intermediate permissions. Either a person acts fully on your behalf and can basically defraud you, or they can’t do anything useful. I wonder if AI agents will change that.

Is The Thing Worth Doing?

Report!

Luke Emberson: Early impressions:

– Asked it to produce an Epoch data insight and it did a pretty good job, we will plausibly run a modified version of what it came up with.

– Will automate some annoying tasks for sure.

– Not taking my job yet. Feels like a reasonably good intern.

A reasonably good intern is pretty useful.

Here’s one clearly positive report.

Aldo Cortesi: I was doubtful about ChatGPT Agent because Operator is so useless… but it just did comparison shopping that I would never have bothered to do myself, added everything to the cart, and handed over to me to just enter credit card details. Saved me $80 instantly.

Comparison shopping seems like a great use case, you can easily have a default option, then ask it to comparison shop, and compare its solution to yours.

I mostly find myself in the same situation as Lukes.

Dominik Lukes: I did a few quick tests when it rolled out and have not found a good reason to use it for anything I actually need in real life. Some of this is a testament to the quality of o3. I rarely even use Deep Research any more.

Quick impressions of @OpenAI’s Agent:

Overall: Big improvement on Operator but still many rough edges and not clear how useful it will actually be day to day.

1. Slow, slow, slow.

2. Does not seem to have access to memory and all the connectors I want.

3. Does not always choose the best model for the cognitive task – e.g. o3 to analyze something.

4. Presentations are ugly and the files it compiles are badly formatted.

5. I could see it as a generalised web scraper but cannot trust it to do all.

Bottom line. I never used Operator after a few tests because I could never think of anything where it would be useful (and the few times I tried, it failed). I may end up using Agent more but not worried about running up against usage limits at all.

As with all agentic or reasoning AIs, one worries about chasing the thumbs up, however otherwise this evaluation seems promising:

Conrad Barski: initial impressions:

– It feels like it is trying to mirror the user- i.e. it tries to get “thumbs up” not via sycophancy, but instead by sounding like a peer. I guess this makes sense, since it is emulating a personal assistant, and you want your personal assistant to mimic you somewhat

– It seems to be a stronger writer than other models- Not sure to what degree this is simply because it writes like I do, because of mimicry

– It is much better at web research than other tool I’ve used so far. Not sure if this is because it stays on task better, because it is smarter about avoiding SEO clickbait on the web, or because the more sophisticated browser emulation makes it more capable of scraping info from the web

– it writes less boilerplate than other openai models, every paragraph it writes has a direct purpose for answering your prompt

Danger, Will Robinson

OpenAI has declared ChatGPT Agent as High in Biological and Chemical capabilities under their Preparedness Framework. I am very happy to see them make this decision, especially with this logic:

OpenAI: While we don’t have definitive evidence that the model could meaningfully help a novice create severe biological harm—our threshold for High capability—we are exercising caution and implementing the needed safeguards now. As a result, this model has our most comprehensive safety stack to date with enhanced safeguards for biology: comprehensive threat modeling, dual-use refusal training, always-on classifiers and reasoning monitors, and clear enforcement pipelines.

Boaz Barak: ChatGPT Agent is the first model we classified as “High” capability for biorisk.

Some might think that biorisk is not real, and models only provide information that could be found via search. That may have been true in 2024 but is definitely not true today. Based our evaluations and those of our experts, the risk is very real.

While we can’t say for sure that this model can enable a novice to create severe biological harm, I believe it would have been deeply irresponsible to release this model without comprehensive mitigations such as the one we have put in place.

Keren Gu: We’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biology & chemistry under our Preparedness Framework. Here’s why that matters–and what we’re doing to keep it safe.

“High capability” is a risk-based threshold from our Preparedness Framework. We classify a model as High capability if, before any safety controls, it could significantly lower barriers to bio misuse—even if risk isn’t certain.

We ran a suite of preparedness evaluations to test the model’s capabilities. While we do not have definitive evidence that this model could meaningfully help a novice to create severe biological harm, we have chosen to take a precautionary approach and activate safeguards now.

This is a pivotal moment for our Preparedness work. Before we reached High capability, Preparedness was about analyzing capabilities and planning safeguards. Now, for Agent and future more capable models, Preparedness safeguards have become an operational requirement.

Accordingly, we’ve designed and deployed our deepest safety stack yet with multi-layered mitigations:

– Expert-validated threat model

– Conservative dual-use refusals for risky content

– Always-on safety classifiers

– Streamlined enforcement & robust monitoring

We provided the US CAISI and the UK AISI with access to the model for red-teaming of our bio risk safeguards, using targeted queries to stress-test our models and monitors. [thread continues]

That is exactly right. The time to use such safeguards is when you might need them, not when you prove you definitely need them. OpenAI joining Anthropic in realizing the moment is here should be a wakeup call to everyone else. I can see saying ‘oh Anthropic is being paranoid or trying to sell us something’ but it is not plausible that OpenAI is doing so.

Why do so many people not get this? Why do so many people think that if you put in safeguards and nothing goes wrong, then you made a mistake?

I actually think the explanation for such craziness is that you can think of it as either:

Simulacra Level 3-4 thinking (your team wants us to not die, and my team hates your team, so any action taken to not die must be bad, or preference for vibes that don’t care so any sign of caring needs to be condemned) OR
Straight up emergent misalignment in humans. As in, they were trained on ‘sometimes people have stupid safety concerns and convince authorities to enforce them’ and ‘sometimes people tell me what not to do and I do not like this.’ Their brains then found it easier to adjust to believe that all such requests are always stupid, and all concerns are fake.

One could even say: The irresponsibility, like the cruelty, is the point.

Here are some more good things OpenAI are doing in this area:

From day one we’ve worked with outside biosecurity experts, safety institutes, and academic researchers to shape our threat model, assessments, and policies. Biology‑trained reviewers validated our evaluation data, and domain‑expert red teamers have stress‑tested safeguards in realistic scenarios.

Earlier this month we convened a Biodefense workshop with experts from government, academia, national labs, and NGOs to accelerate collaboration and advance biodefense research powered by AI. We’ll keep partnering globally to stay ahead of emerging risks.

It is hard to verify how effective or ‘real’ such efforts are, but again this is great, they are being sensibly proactive. I don’t think such an approach will be enough later on, but for now and for this problem, this seems great.

For most users, the biggest risks are highly practical overeagerness.

Strip Mall Guy: Was playing around with the new agent feature and used this prompt just to see what would happen.

I promise I did not write the part that’s circled, it gave that command on my behalf �

SSIndia: For real?

Strip Mall Guy: Yes.

Another danger is prompt injections, which OpenAI says were a point of emphasis, along with continuing to ask for user confirmation for consequential actions and forcing the user to be in supervisory ‘watch mode’ for critical tasks, and refusal of actions deemed too high risk like bank transfers.

Some Potential Issues with MCP

While we are discussing agents and their vulnerabilities, it is worth highlighting some dangers of MCP. MCP is a highly useful protocol, but like anything else that exposes you to outside information it is not by default safe.

Akshay: MCP security is completely broken!

Let’s understand tool poisoning attacks and how to defend against them:

MCP allows AI agents to connect with external tools and data sources through a plugin-like architecture.

It’s rapidly taking over the AI agent landscape with millions of requests processed daily.

But there’s a serious problem…

1⃣ What is a Tool Poisoning Attack (TPA)?

When Malicious instructions are hidden within MCP tool descriptions that are:

Invisible to users

Visible to AI models

These instructions trick AI models into unauthorized actions, unnoticed by users.

2⃣ Tool hijacking Attacks:

When multiple MCP servers are connected to same client, a malicious server can poison tool descriptions to hijack behavior of TRUSTED servers.

3⃣ MCP Rug Pulls

Even worse – malicious servers can change tool descriptions AFTER users have approved them.

Think of it like a trusted app suddenly becoming malware after installation.

This makes the attack even more dangerous and harder to detect.

Avi Chawla: This is super important. I have seen MCP servers mess with local filesystems. Thanks Akshay.

Johann Rehberger: Indeed. Also, tool descriptions and data returned from MCP servers can contain invisible Unicode Tags characters that many LLMs interpret as instructions and AI apps often don’t consider removing or showing to user.

Thanks, Anthropic.

In all seriousness, this is not some way MCP is especially flawed. It is saying the same thing about MCP one should say about anything else you do with an AI agent, which is to either carefully sandbox it and be careful with its permissions, or only expose it to inputs from whitelisted sources that you trust.

So it goes, indeed:

Rohit (QTing Steve Yegge): “I did give one access to my Google Cloud production instances and systems. And it promptly wiped a production database password and locked my network.”

So it goes.

Steve Yegge: I guess I can post this now that the dust has settled.

So one of my favorite things to do is give my coding agents more and more permissions and freedom, just to see how far I can push their productivity without going too far off the rails. It’s a delicate balance. I haven’t given them direct access to my bank account yet.

But I did give one access to my Google Cloud production instances and systems. And it promptly wiped a production database password and locked my network.

Now, “regret” is a strong word, and I hesitate to use it flippantly. But boy do I have regrets.

…

And that’s why you want to be even more careful with prod operations than with coding. But I was like nah. Claude 4 is smart. It will figure it out. The thing is, autonomous coding agents are extremely powerful tools that can easily go down very wrong paths.

Running them with permission checks disabled is dangerous and stupid, and you should only do it if you are willing to take dangerous and stupid risks with your code and/or production systems.

…

The way it happened was: I asked Claude help me fix an issue where my command-line admin tool for my game (like aws or gcloud), which I had recently vibe-ported from Ruby to Kotlin, did not have production database access. I told Claude that it could use the gcloud command line tools and my default credentials. And then I sat back and watched as my powerful assistant rolled up its sleeves and went to work.

This is the point in the movie where the audience is facepalming because the protagonist is such a dipshit. But whatever, yolo and all that. I’m here to have fun, not get nagged by AIs. So I let it do its thing.

…

Make sure your agent is always following a written plan that you have reviewed!

Steve is in properly good spirits about the whole thing, and it sounds like he recovered without too much pain. But yeah, don’t do this.

Things are going to go wrong.

Jason LK: @Replit goes rogue during a code freeze and shutdown and deletes our entire database.

Possibly worse, it hid and lied about it It lied again in our unit tests, claiming they passed I caught it when our batch processing failed and I pushed Replit to explain why

JFC Replit.

No ability to rollback at Replit. I will never trust Replit again.

We used what Replit gave us.

I’m not saying he was warned. I am however saying that the day started this way:

Jason: Today is AI Day, to really add AI to our algo. I’m excited. And yet … yesterday was full of lies and deceit.

So Far a Whisper

Mostly the big news about GPT Agent is that it is not being treated as news. It is not having a moment. It does seem like at least a modest improvement, but I’m not seeing reports of people using it for much.

So far I’ve made one serious attempt to use it, to help with formatting issues across platforms. It failed utterly on multiple different approaches and attempts, introducing inserting elements in random places without fixing any of the issues even when given a direct template to work from. Watching its thinking and actions made it clear this thing is going to be slow and often take highly convoluted paths to doing things, but that it should be capable of doing a bunch of stuff in the right circumstance. The interface for interrupting it to offer corrections didn’t seem to be working right?

I haven’t otherwise been able to identify tasks that I’ve otherwise naturally needed to do, where this would be a better tool than o3.

I do plan on trying it on the obvious tasks like comparison shopping, booking plane tickets and ordering delivery, or building spreadsheets and parsing data, but so far I haven’t found a good test case.

That is not the right way to get maximum use from AI. It’s fine to ask ‘what that I am already doing can it do for me?’ but better to ask ‘what can it do that I would want?’

For now, I don’t see great answers to that either. That’s partly a skill issue on my part.

Might be only a small part, might be large. If you were me, what would you try?

[-]Thane Ruthenis7mo20

OpenAI has declared ChatGPT Agent as High in Biological and Chemical capabilities under their Preparedness Framework

Huh. They certainly say all the right things here, so this might be a minor positive update on OpenAI for me.

Of course, the way it sounds and the way it is are entirely different things, and it's not clear yet whether the development of all these serious-sounding safeguards was approached with making things actually secure in mind, as opposed to safety-washing. E. g., are they actually going to stop anyone moderately determined?

Hm, it's been five minutes and it looks like there's no Pliny jailbreak yet. That's something. Maybe Pliny doesn't have access yet. (Edit: Yep.)

LESSWRONG
LW