This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Anthropic (org)
•
Applied to
Vaniver's thoughts on Anthropic's RSP
by
Gunnar_Zarncke
2mo
ago
•
Applied to
Introducing Alignment Stress-Testing at Anthropic
by
Gunnar_Zarncke
2mo
ago
•
Applied to
On Anthropic’s Sleeper Agents Paper
by
Gunnar_Zarncke
2mo
ago
•
Applied to
Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation
by
Soroush Pour
5mo
ago
•
Applied to
Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
by
Zac Hatfield-Dodds
5mo
ago
•
Applied to
Comparing Anthropic's Dictionary Learning to Ours
by
Robert_AIZI
6mo
ago
•
Applied to
Measuring and Improving the Faithfulness of Model-Generated Reasoning
by
HenningB
6mo
ago
•
Applied to
Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
by
Zac Hatfield-Dodds
6mo
ago
•
Applied to
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
by
Zac Hatfield-Dodds
6mo
ago
•
Applied to
Amazon to invest up to $4 billion in Anthropic
by
RobertM
6mo
ago
•
Applied to
AI Awareness through Interaction with Blatantly Alien Models
by
VojtaKovarik
8mo
ago
•
Applied to
Frontier Model Forum
by
RobertM
8mo
ago
•
Applied to
Frontier Model Security
by
RobertM
8mo
ago
•
Applied to
Anthropic Observations
by
RobertM
8mo
ago
•
Applied to
Anthropic | Charting a Path to AI Accountability
by
Gabriel Mukobi
9mo
ago
•
Applied to
Rishi Sunak mentions "existential threats" in talk with OpenAI, DeepMind, Anthropic CEOs
by
Baldassare Castiglione
10mo
ago
•
Applied to
Request to AGI organizations: Share your views on pausing AI progress
by
Akash
1y
ago
•
Applied to
Anthropic is further accelerating the Arms Race?
by
Ruby
1y
ago