This is a linkpost for

Surprised I couldn't find this anywhere on lesswrong so thought I'd add it. Seems like there would be some alignment implications of LLM behavior changing over time, at the least gaining a bit more context.

Someone else I spoke to about this immediately deflated it with regards to some sort of experimental error that makes the paper's conclusions pretty void but I don't really see this.

New Comment