LESSWRONGTags
LW

Adversarial Examples

EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
Adversarial Examples
Random Tag
Contributors
1Multicore

Adversarial examples are situations that have unusual features that will cause an AI to make choices that seem obviously wrong to a human. For example, an image of a panda can be subtly manipulated so that an image classifier classifies it as a gibbon.

Posts tagged Adversarial Examples
Most Relevant
4
646SolidGoldMagikarp (plus, prompt generation)Ω
Jessica Rumbelow, mwatkins
2mo
Ω
194
3
54AI Safety in a World of Vulnerable Machine Learning SystemsΩ
AdamGleave, EuanMcLean
20d
Ω
10
2
55Human beats SOTA Go AI by learning an adversarial policy
Vanessa Kosoy
1mo
32
2
35If I were a well-intentioned AI... I: Image classifierΩ
Stuart_Armstrong
3y
Ω
4
2
31Adversarial Policies Beat Professional-Level Go AIs
sanxiyn
5mo
35
2
13The Goodhart GameΩ
John_Maxwell
3y
Ω
5
2
12AXRP Episode 1 - Adversarial Policies with Adam GleaveΩ
DanielFilan
2y
Ω
5
1
142High-stakes alignment via adversarial training [Redwood Research report]Ω
dmz, LawrenceC, Nate Thomas
1y
Ω
29
1
27[AN #62] Are adversarial examples caused by real but imperceptible features?Ω
Rohin Shah
4y
Ω
10
1
22Evidence Sets: Towards Inductive-Biases based Analysis of Prosaic AGIΩ
bayesian_kitten
1y
Ω
10
1
20The Achilles Heel Hypothesis for AI
scasper
2y
6
1
18EIS IX: Interpretability and AdversariesΩ
scasper
1mo
Ω
1
1
17Adversarial attacks and optimal controlΩ
Jan
10mo
Ω
7
1
12EIS XII: Summary Ω
scasper
1mo
Ω
0
1
12EIS X: Continual Learning, Modularity, Compression, and Biological BrainsΩ
scasper
1mo
Ω
3
Load More (15/15)
Add Posts