x

LESSWRONG

LW

dgros — LessWrong

dgros

dgros

Message

Currently a Member of Technical Staff at FAR.AI. Previously PhD @ UC Davis, and past contributions @ Microsoft, NASA, and elsewhere. Interested in a mix of AI safety topics. Opinions are my own.

51

1

17

3y

dgros

Currently a Member of Technical Staff at FAR.AI. Previously PhD @ UC Davis, and past contributions @ Microsoft, NASA, and elsewhere. Interested in a mix of AI safety topics. Opinions are my own.

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

This is a small study that explores using tool calls to wrap untrusted parts of prompts. OpenAI's model spec considers tool results the least trusted kind of input. If tool-wrapping helped, it would be an easy way to improve robustness while using existing APIs models already support. In 3 tested...