x

LESSWRONG

LW

wgryc — LessWrong

wgryc

wgryc

Message

-1

1

3y

wgryc

-1

3y

LLM-bots are inherently easy to align.

I don't know about this... You're using an extremely sandboxed LLM that has been trained aggressively to prevent itself from saying anything controversial. There's nothing preventing someone from finetuning a model to remove some of these ethical considerations, especially as GPU compute becomes more powerful and model weights are leaked (e.g. Llama).

In fact, the amount of money and effort that has gone into aligning LLM bots shows that in fact, they are not easy to align and require significant resources to do so.