ATTENTION GATHERS, MLPS COMPOSE: A CAUSAL ANALYSIS OF AN ACTION-OUTCOME CIRCUIT IN VIDEOVIT
Sai Chereddy Independent Researcher Navi Mumbai, India saivivaswanthreddy@alumni.usc.edu ABSTRACT This paper investigates how video models represent nuanced information that does not alter the final classification. Using a minimal pair of videos—a bowling strike and a gutter ball—we analyze the Vanilla Video Vision Transformer (google/vivit-b-16x2-kinetics400) model, which robustly classifies both as...
Oct 11, 20251