This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
43
Wikitags
Adversarial Examples (AI)
Edited by
Multicore
,
Ruby
last updated
14th Dec 2024
This page is a stub.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
Adversarial Examples (AI)
Most Relevant
8
687
SolidGoldMagikarp (plus, prompt generation)
Ω
Jessica Rumbelow
,
mwatkins
3y
Ω
208
3
159
Ironing Out the Squiggles
Zack_M_Davis
2y
36
3
70
AI Safety in a World of Vulnerable Machine Learning Systems
Ω
AdamGleave
,
EuanMcLean
3y
Ω
29
2
127
Deep Forgetting & Unlearning for Safely-Scoped LLMs
Ω
scasper
2y
Ω
30
2
98
Solving adversarial attacks in computer vision as a baby version of general AI alignment
Ω
Stanislav Fort
1y
Ω
9
2
59
Human beats SOTA Go AI by learning an adversarial policy
Vanessa Kosoy
3y
29
2
38
What progress have we made on automated auditing?
Q
Ω
LawrenceC
1y
Q
Ω
1
2
35
If I were a well-intentioned AI... I: Image classifier
Ω
Stuart_Armstrong
6y
Ω
4
2
31
Adversarial Policies Beat Professional-Level Go AIs
sanxiyn
3y
35
2
30
Adversarial Robustness Could Help Prevent Catastrophic Misuse
Ω
aog
2y
Ω
18
2
13
The Goodhart Game
Ω
John_Maxwell
6y
Ω
5
2
12
AXRP Episode 1 - Adversarial Policies with Adam Gleave
Ω
DanielFilan
5y
Ω
5
2
5
RAIN: Your Language Models Can Align Themselves without Finetuning - Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!
Singularian2501
2y
0
1
142
High-stakes alignment via adversarial training [Redwood Research report]
Ω
dmz
,
LawrenceC
,
Nate Thomas
4y
Ω
29
1
130
Even Superhuman Go AIs Have Surprising Failure Modes
Ω
AdamGleave
,
EuanMcLean
,
Tony Wang
,
Kellin Pelrine
,
Tom Tseng
,
Yawen Duan
,
Joseph Miller
,
MichaelDennis
2y
Ω
22