We chose maxpooling (vs. average pooling) as our aggregation method because we primarily care about the largest extent to which a property exists in a data sample (vs. the average presence of a property in the sample). In this work, we're using SAE features to construct a document-level embedding in isolation of sample-specific information like the sample length. However, we did not thoroughly investigate other aggregation methods and think that further investigation could be interesting to explore. For instance, if we care about distinguishing short samples where a property occurs once vs. long samples where the property occurs over and over, then a different aggregation method would work better.
We chose maxpooling (vs. average pooling) as our aggregation method because we primarily care about the largest extent to which a property exists in a data sample (vs. the average presence of a property in the sample). In this work, we're using SAE features to construct a document-level embedding in isolation of sample-specific information like the sample length. However, we did not thoroughly investigate other aggregation methods and think that further investigation could be interesting to explore. For instance, if we care about distinguishing short samples where a property occurs once vs. long samples where the property occurs over and over, then a different aggregation method would work better.