This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
AI
•
Applied to
Do Not Mess With Scarlett Johansson
by
TagWrong
1h
ago
•
Applied to
Cicadas, Anthropic, and the bilateral alignment problem
by
TagWrong
5h
ago
•
Applied to
Announcing Human-aligned AI Summer School
by
TagWrong
8h
ago
•
Applied to
Each Llama3-8b text uses a different "random" subspace of the activation space
by
TagWrong
9h
ago
•
Applied to
ARIA's Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th
by
TagWrong
10h
ago
•
Applied to
Anthropic announces interpretability advances. How much does this advance alignment?
by
TagWrong
18h
ago
•
Applied to
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
by
TagWrong
20h
ago
•
Applied to
Mitigating extreme AI risks amid rapid progress [Linkpost]
by
TagWrong
21h
ago
•
Applied to
On Dwarkesh’s Podcast with OpenAI’s John Schulman
by
TagWrong
1d
ago
•
Applied to
Is deleting capabilities still a relevant research question?
by
TagWrong
1d
ago
•
Applied to
New voluntary commitments (AI Seoul Summit)
by
TagWrong
1d
ago
•
Applied to
The Problem With the Word ‘Alignment’
by
particlemania
2d
ago
•
Applied to
What's Going on With OpenAI's Messaging?
by
TagWrong
2d
ago
•
Applied to
Harmony Intelligence is Hiring!
by
TagWrong
2d
ago
•
Applied to
[Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.
by
TagWrong
2d
ago
•
Applied to
Are there any groupchats for people working on Representation reading/control, activation steering type experiments?
by
TagWrong
2d
ago