Word importance in text <= conditional information of the token in the context. Is this assumption valid?
Words that are harder to predict from context typically carry more information(or surprisal). Does more information/surprisal means more importance, given everything else the same(correctness/plausibility, etc.)? A simple example: “This morning I opened the door and saw a 'UFO'.” vs “This morning I opened the door and saw a 'cat'.” —...
More potential importance
<-> Not all surprising parts of a sentence are important.
<-> All important parts must be surprising.
<-> Word importance in text <= conditional information of the token in the context.
Thank you again! You help me see it more clear!