Jason Hise's Shortform

Jason Hise

Jason Hise's Shortform

21st Apr 2026

1 min read

1

This is a special post for quick takes by Jason Hise. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 1:21 AM

[-]Jason Hise2d1-1

Using a less capable AI to evaluate the danger of outputs of more capable models feels quite limited. Often what we care more about is whether user prompts are laundering intent. There might be an information-theoretic defense here that is better than evaluating outputs… evaluate inputs. Treat user prompt ‘obfuscation’ as a measure of intent to attack. If there was an easier way to ask for something, then asking in an obfuscated way is a signal.

Reply

Moderation Log