Better alignment may be achieved by treating token probabilities as an election rather than the ground truth

Brodie Eaton

1 Better alignment may be achieved by treating token probabilities as an election rather than the ground truth

by Brodie Eaton

2nd Feb 2026

1 min read

0

1

Rejected for the following reason(s):

Insufficient Quality for AI Content.

Read full explanation

The idea is simple - when generating text about a topic with multiple possible intents using zero temperature, tokens that all lead to the same general idea split the vote amongst themselves, just like a "first-past-the-post" election. If there are more ways to word a response that aligns with the training data and thus human intent than there are ways to word a response that does not align with the training data, and thus not align with human intent, then all the tokens that lead to an aligned response split the vote amongst one another and a token that leads to an unaligned response is chosen despite the intention behind the chosen token going against the intent of the majority of the training data. In theory, because this fix is applied to the way we select tokens rather than the way we train an AI, if a solution is built to identify the intentions behind each token to pick a token that follows the modal intention rather than the modal token, it could be applied to any already-existing LLM to make it safer and smarter at lower temperatures. This report has been written entirely by Gemini, but to my untrained eye it looks like it could serve some value if someone who does know what they're doing steals the idea and does a better job of implementing it.

Aligned AI ProposalsAI

1

New Comment

Moderation Log

Curated and popular this week

LESSWRONG
LW

LESSWRONG
LW

1

Better alignment may be achieved by treating token probabilities as an election rather than the ground truth

1

1