Word importance in text <= conditional information of the token in the context. Is this assumption valid?
Words that are harder to predict from context typically carry more information(or surprisal). Does more information/surprisal means more importance, given everything else the same(correctness/plausibility, etc.)? A simple example: “This morning I opened the door and saw a 'UFO'.” vs “This morning I opened the door and saw a 'cat'.” —...
Feb 23