LESSWRONG
LW

1652
BestJohn
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
The Waluigi Effect (mega-post)
BestJohn3y10

What does trust mean, from the perspective of the LLM algorithm, in terms of a flattery-component? Do LLMs have a 'trustometer?' or can they evaluate some sort of stored world-state, compare the prompt, and come up with a "veracity" value that they use when responding the prompt?

Reply