x

LESSWRONG

LW

Constitutional AI — LessWrong

Constitutional AI

Edited by Benaya Koren, et al. last updated 11th Jul 2023

Constitutional AI is a method for fine-tuning language models, used in Anthropic's Claude. The main conceptual difference from RLHF is that instead of human feedback on specific behaviors it relies on the model's ability to apply general principles (stated in natural language) to specific situations.

Add Posts

1

1

Posts tagged Constitutional AI

2

170Terrified Comments on Corrigibility in Claude's Constitution

2mo

69

2

163Prologue to Terrified Comments on Claude's Constitution

2mo

27

2

80Notes on notes on virtues

5y

11

2

75Open Problems With Claude’s Constitution

3mo

1

2

62Thoughts on Claude's Constitution

3mo

13

2

58The Claude Constitution’s Ethical Framework

4mo

1

2

24Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results

2y

0

2

20Listing the virtues from Claude’s “Constitution”

4mo

5

2

15Claude's Constitution

3mo

4

1

26What can we say about the cosmic host?

2mo

0

1

20The V&V method - A step towards safer AGI

11mo

1

1

17Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)

1y

1

1

16Contextual Constitutional AI

2y

2

1

16ECL-pilled models write constitutions for ASI

2mo

0

1

11Constitutions for ASI?

1y

0

Load More (15/21)

Add Posts