I didn't downvote this since I think the study (linked in another comment) is interesting, but I'm also not upvoting because the link is to a YouTube video instead of the actual study. I just wanted to add another data point on this.
EDIT: Actually, this is interesting enough that I did upvote it, but I still think you'll get a lot more interest if you link the text and not a video (or link the text but include the video link in the body).
See the linked video. The original blog post is https://www.generalanalysis.com/blog/supabase-mcp-blog
If you want readers to have the context of a particular blog post, a helpful thing to do is to link the blog post directly.
My background: researcher in AI security.
This recent study demonstrates how a common AI-assisted developer setup can be exploited with prompt injection to leak private info. Practically speaking, AI coding tools are almost certainly going to stay, and the setup described in the study (Cursor + MCP tools with dev permissions) is probably used by millions as of today. The concept of prompt injection is not new. Yet, it's interesting to see such a common software dev setup being so fragile. The software dev scenario is one of those use cases that depend on the LLMs knowing which parts of the text are instructions and which parts are data. I think it's necessary for AI tools to condition follow-up actions on the meanings of the data read, but different parts of the context should be tagged with "permissions".
A potential solution to this is to embed the concept of "entities with variable permissions" during training (the RL step). Current foundation models are trained as chatbots, where the input is from a single end user, whose instructions need to carry a lot of weight for the AI to be a helpful assistant. Yet people use these instruction-following chatbots to process unsanitized data.
Any suggestions on post training solutions for this?