LESSWRONG
LW

2220
Mark Keavney
39330
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Misalignment and Roleplaying: Are Misaligned LLMs Acting Out Sci-Fi Stories?
Mark Keavney1mo10

I think it's an interesting idea. I've always been struck by how sensitive LLM responses seem to be to changes in wording, so I could imagine that having a simple slogan repeated could make a big impact, even if it's something the LLM ideally "should" already know.

Reply
Misalignment and Roleplaying: Are Misaligned LLMs Acting Out Sci-Fi Stories?
Mark Keavney1mo10

Yes, the question of which frame ("story" vs "factual") better matches the real world is an interesting one. It's true that there are multiple dramatic cues in the story frame that might not be present in a real case. But it's also true that a real situation isn't just a dry set of facts. In the real world, people have names, they have personal details, they use colorful language. It isn't clear that the factual frame is really the best match for how an LLM would experience a real use case. 

Of course, ideally the dependent variable would be a measure of real misalignment, not a hypothetical prediction. If anyone has ideas for an easy way to measure something like that I would love to discuss it, because I think that would be a big improvement to the study.

Reply
Should we align AI with maternal instinct?
Mark Keavney2mo2-1

I agree. That was my reaction to Hinton's comment as well - that it's good to think in terms of relationship rather than control, but that the "maternal instinct" framing was off. 

At the risk of getting too speculative, this has implications for AI welfare as well. I don't believe that current LLMs have feelings, but if we build AGI it might. And rather than thinking about how to make such an entity a controllable servant, we should start planning how to have a mutually beneficial relationship with it.

Reply
31Misalignment and Roleplaying: Are Misaligned LLMs Acting Out Sci-Fi Stories?
1mo
5
9Misgeneralization of Fictional Training Data as a Contributor to Misalignment
2mo
1
1Are Misaligned LLMs Acting Out Sci-Fi Stories?
2mo
0