LESSWRONG
LW

53
wnx
3020
Message
Dialogue
Subscribe

AI safety, risks from global systems, philosophy, engineering physics, complex adaptive systems, values for the future, future of intelligence. 

Scale, cascades and cumulative impacts from interconnected systems. AI systems' impact on human cognitive and moral autonomy. Evaluations, measurement, robustness, standards and monitoring of AI systems.

Interest in alignment approaches at different abstraction levels, e.g., macro-level interpretability, scaffolding/module-level AI system safety and systems-level theoretic process analysis for safety.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Introducing Alignment Stress-Testing at Anthropic
wnx2y32

Alignment approaches at different abstraction levels (e.g., macro-level interpretability, scaffolding/module-level AI system safety, systems-level theoretic process analysis for safety) is something I have been hoping to see more of. I am thrilled by this meta-level red-teaming work and excited to see the announcement of the new team.

Reply
Shallow review of live agendas in alignment & safety
wnx2y21

Hey, great stuff -- thank you for sharing! I especially found this useful as somebody who has been "out" of alignment for 6 months and is looking to set up a new research agenda.

Reply
No posts to display.