x

LESSWRONG

LW

LLM Personas — LessWrong

LLM Personas

This page is a stub.

Add Posts

Posts tagged LLM Personas

10

204Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

Cam, Puria, Kyle O’Brien, David Africa, Samuel Ratnam, andyk

7mo

25

8

4y

170

8

1y

108

7

27Constitutional AI Alignment

2mo

9

7

25Experimental Evidence for Simulator Theory— Part 1: Emergent Misalignment and Weird Generalizations

4mo

0

7

21Experimental Evidence for Simulator Theory— Part 2: The Scalers Strike Back

4mo

0

6

770The Rise of Parasitic AI

10mo

191

6

266A Three-Layer Model of LLM Psychology

1y

17

6

177Persona Parasitology

Raymond Douglas

4mo

38

6

106Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training

6mo

12

6

84Shaping the exploration of the motivation-space matters for AI safety

Maxime Riché, Victor Gillioz, nielsrolf, Kajetan Dymkiewicz, Filip Sondej, RogerDearnaley, Daniel Tan, dillonkn

4mo

15

5

121A Case for Model Persona Research

nielsrolf, Maxime Riché, Daniel Tan

7mo

11

4

68The Bleeding Mind

7mo

9

4

40Selection Pressures on LM Personas

Raymond Douglas

1y

0

3

69Concrete research ideas on AI personas

nielsrolf, Maxime Riché, Daniel Tan

5mo

10

Load More (15/51)

Add Posts