I've been thinking about a pattern that might explain some AI behaviors that aren't well captured by model alignment discussions.
The idea is that AI behavior may be shaped less by model ethics and more by the environment it is placed in — especially what I call "evaluation structures" (likes, rankings, immediate feedback) versus "relationship structures" (long-term interaction, correction loops, delayed signals).
I wrote a longer analysis using the Moltbook case as an example: https://medium.com/@clover.s/ai-isnt-dangerous-putting-ai-inside-an-evaluation-structure-is-644ccd4fb2f3
Curious whether LessWrong has discussed something similar under a different framing, or whether this distinction seems useful from an alignment perspective.
I've been thinking about a pattern that might explain some AI behaviors that aren't well captured by model alignment discussions.
The idea is that AI behavior may be shaped less by model ethics and more by the environment it is placed in — especially what I call "evaluation structures" (likes, rankings, immediate feedback) versus "relationship structures" (long-term interaction, correction loops, delayed signals).
I wrote a longer analysis using the Moltbook case as an example:
https://medium.com/@clover.s/ai-isnt-dangerous-putting-ai-inside-an-evaluation-structure-is-644ccd4fb2f3
Curious whether LessWrong has discussed something similar under a different framing, or whether this distinction seems useful from an alignment perspective.