LESSWRONG
LW

aggliu
123Ω1550
Message
Dialogue
Subscribe

Author, YouTuber, Script Writer for Rational Animations.  A.B. in Math (Harvard 2020)

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
If Anyone Builds It, Everyone Dies: Advertisement design competition
aggliu8h00

Just an idle thought, "design@intelligence.org" sounds like a creationist mailing list.

Reply
Self-fulfilling misalignment data might be poisoning our AI models
aggliu3mo10

The biggest piece of evidence that I've seen is the Emergent Misalignment paper, linked in Gurkenglas's parallel comment to this one.

Reply
Will Jesus Christ return in an election year?
aggliu3mo20

This is probably well-known at this point, but the $6 million item is not the duct-taped banana itself, but the right to display the duct-taped banana. That may not make it any more or less sensible, though.

Reply
Self-fulfilling misalignment data might be poisoning our AI models
aggliu4mo10

Yep, that's why I mentioned evil numbers specifically.

Reply
Self-fulfilling misalignment data might be poisoning our AI models
aggliu4moΩ161

I am a bit worried that making an explicit persona for the AI (e.g. using a special token) could magnify the Waluigi effect. If something (like a jailbreak or writing evil numbers) engages the AI to act as an "anti-𐀤" then we get all the bad behaviors at once in a single package. This might not outweigh the value of having the token in the first place, or it may experimentally turn out to be a negligible effect, but it seems like a failure mode to watch out for.

Reply
10When will AI automate all mental work, and how fast?
1mo
0
20Can Knowledge Hurt You? The Dangers of Infohazards (and Exfohazards)
5mo
0
53S-Risks: Fates Worse Than Extinction
1y
2
26How to Upload a Mind (In Three Not-So-Easy Steps)
2y
0
50How to Eradicate Global Extreme Poverty [RA video with fundraiser!]
2y
5