x

Jeffrey Hemphill

Subscribe

Message

3

2

3y

Jeffrey Hemphill

Subscribe

Message

3

2

3y

I played the AI box game as the Gatekeeper — and lost

Jeffrey Hemphill4mo10

A comment on the AI Box experiment from 2025! What a beautiful thing!

I feel like the AI-Box experiment is pretty dated at this point, for two reasons:

I think the trick that people used to escape gatekeepers back in the day was simply a Roko's Basilisk or Pascal's Wager type of argument. In the case that a real future AI does escape and rule over us, surely the gatekeeper can afford to spend $10 and a bit of pride in order to curry favor with it. Yes, technically we'll never know exactly what people said in the box experiments, but the tactics used leaked t

... (read more)

Reply

Google "We Have No Moat, And Neither Does OpenAI"

Jeffrey Hemphill3y4-1

You can think of it as “dangerous capabilities in everyone’s hands”, but I prefer to think of it as “everyone in the world can work on alignment in a hands-on way, and millions of people are exposed to the problem in a much more intuitive and real way than we ever foresaw”.

Ordinary people without PhDs are learning what capabilities and limitations LLMs have. They are learning what capabilities you can and cannot trust an LLM with. They are coming up with creative jailbreaks we never thought of. And they’re doing so with toy models that don’t have superhuma... (read more)

Reply