Gunnar Carlsson — LessWrong

Thanks for you comment! My feeling is that the inclusion of "understood" features, as described in this post, will contribute to our understanding of what goes on inside the machines, and therefore allow us to guide and control them better. I am expecting that it will be very important to the application of LLMs as well. So, yes, it may accelerate some things, but it will also add to the degree of controllability that is available to us. I think singluar learning theory is a great direction to move in, and will move us further in the interpretability direction. Not everything in the world is smooth.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments