This is a linkpost for https://arxiv.org/abs/2307.09009

Surprised I couldn't find this anywhere on lesswrong so thought I'd add it. Seems like there would be some alignment implications of LLM behavior changing over time, at the least gaining a bit more context.

Someone else I spoke to about this immediately deflated it with regards to some sort of experimental error that makes the paper's conclusions pretty void but I don't really see this.

New to LessWrong?

New Comment