Words that are harder to predict from context typically carry more information(or surprisal). Does more information/surprisal means more importance?
A simple example: “This morning I opened the door and saw a 'UFO'.” vs “This morning I opened the door and saw a 'cat'.” — clearly "UFO" carries more information.
'UFO' seems more important here. But is this because it carries more information? This topic may be around the information-theoretic nature of language.
If this is true, it's simple and helpful to analyze text information density with large language models and visualizes where the important parts are.
It is a world of information, layered above the physical world. When we read text we are intaking information from a token stream and get various information density across that stream. Just like when we recieve things we get different "worth".
Words that are harder to predict from context typically carry more information(or surprisal). Does more information/surprisal means more importance?
A simple example: “This morning I opened the door and saw a 'UFO'.” vs “This morning I opened the door and saw a 'cat'.” — clearly "UFO" carries more information.
'UFO' seems more important here. But is this because it carries more information? This topic may be around the information-theoretic nature of language.
If this is true, it's simple and helpful to analyze text information density with large language models and visualizes where the important parts are.
It is a world of information, layered above the physical world. When we read text we are intaking information from a token stream and get various information density across that stream. Just like when we recieve things we get different "worth".