Jacob Watts' Shortform

Jacob Watts

Jacob Watts' Shortform

1 min read11th May 20231 comment

This is a special post for quick takes by Jacob Watts. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 12:57 AM

[-]Jacob Watts1y10

I have sometimes seen people/contests focused on writing up specific scenarios for how AI can go wrong starting with our current situation and fictionally projecting into the future. I think the idea is that this can act as an intuition pump and potentially a way to convince people.

I think that is likely net negative given the fact that state of the art AIs are being trained on internet text and stories where a good agent starts behaving badly are a key component motivating the Waluigi effect.

These sort of stories still seem worth thinking about, but perhaps greater care should be taken not to inject GPT-5's training data with examples of chatbots that go murderous. Maybe only post it as a zip file or use a simple cipher.

Reply

Moderation Log

LESSWRONG
LW

Jacob Watts' Shortform

New to LessWrong?