LESSWRONGTags
LW

Adversarial Examples

EditHistory
Discussion (0)
Help improve this page (2 flags)
EditHistory
Discussion (0)
Help improve this page (2 flags)
Adversarial Examples
Random Tag
Contributors
1Multicore

Adversarial examples are situations that have unusual features that will cause an AI to make choices that seem obviously wrong to a human. For example, an image of a panda can be subtly manipulated so that an image classifier classifies it as a gibbon.

Posts tagged Adversarial Examples
4
654SolidGoldMagikarp (plus, prompt generation)Ω
Jessica Rumbelow, mwatkins
4mo
Ω
199
3
60AI Safety in a World of Vulnerable Machine Learning SystemsΩ
AdamGleave, EuanMcLean
3mo
Ω
13
2
55Human beats SOTA Go AI by learning an adversarial policy
Vanessa Kosoy
4mo
32
2
35If I were a well-intentioned AI... I: Image classifierΩ
Stuart_Armstrong
3y
Ω
4
2
31Adversarial Policies Beat Professional-Level Go AIs
sanxiyn
7mo
35
2
13The Goodhart GameΩ
John_Maxwell
4y
Ω
5
2
12AXRP Episode 1 - Adversarial Policies with Adam GleaveΩ
DanielFilan
2y
Ω
5
1
142High-stakes alignment via adversarial training [Redwood Research report]Ω
dmz, LawrenceC, Nate Thomas
1y
Ω
29
1
70SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4Ω
AdamYedidia
2mo
Ω
18
1
29EIS IX: Interpretability and AdversariesΩ
scasper
4mo
Ω
5
1
27[AN #62] Are adversarial examples caused by real but imperceptible features?Ω
Rohin Shah
4y
Ω
10
1
22Evidence Sets: Towards Inductive-Biases based Analysis of Prosaic AGIΩ
bayesian_kitten
2y
Ω
10
1
20The Achilles Heel Hypothesis for AI
scasper
3y
6
1
17Adversarial attacks and optimal controlΩ
Jan
1y
Ω
7
1
14EIS X: Continual Learning, Modularity, Compression, and Biological BrainsΩ
scasper
4mo
Ω
3
Load More (15/17)
Add Posts