LESSWRONG
LW

277
Wikitags

Adversarial Examples (AI)

Edited by Multicore, Ruby last updated 14th Dec 2024
This page is a stub.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged Adversarial Examples (AI)
682SolidGoldMagikarp (plus, prompt generation)
Ω
Jessica Rumbelow, mwatkins
3y
Ω
206
157Ironing Out the Squiggles
Zack_M_Davis
1y
36
70AI Safety in a World of Vulnerable Machine Learning Systems
Ω
AdamGleave, EuanMcLean
3y
Ω
29
126Deep Forgetting & Unlearning for Safely-Scoped LLMs
Ω
scasper
2y
Ω
30
89Solving adversarial attacks in computer vision as a baby version of general AI alignment
Ω
Stanislav Fort
1y
Ω
8
59Human beats SOTA Go AI by learning an adversarial policy
Vanessa Kosoy
3y
29
38What progress have we made on automated auditing?
QΩ
LawrenceC
1y
QΩ
1
35If I were a well-intentioned AI... I: Image classifier
Ω
Stuart_Armstrong
6y
Ω
4
31Adversarial Policies Beat Professional-Level Go AIs
sanxiyn
3y
35
30Adversarial Robustness Could Help Prevent Catastrophic Misuse
Ω
aog
2y
Ω
18
13The Goodhart Game
Ω
John_Maxwell
6y
Ω
5
12AXRP Episode 1 - Adversarial Policies with Adam Gleave
Ω
DanielFilan
5y
Ω
5
5RAIN: Your Language Models Can Align Themselves without Finetuning - Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!
Singularian2501
2y
0
142High-stakes alignment via adversarial training [Redwood Research report]
Ω
dmz, LawrenceC, Nate Thomas
3y
Ω
29
130Even Superhuman Go AIs Have Surprising Failure Modes
Ω
AdamGleave, EuanMcLean, Tony Wang, Kellin Pelrine, Tom Tseng, Yawen Duan, Joseph Miller, MichaelDennis
2y
Ω
22
Load More (15/32)
Add Posts