Thank you for the interesting article, I came here after reading Noa Nabeshima's implementation. I have a question about an alternative approach to your reconstruction loss calculation. In your current method, you divide latents into fixed nested groups, where each group's reconstruction loss is calculated using all activated latents within that group's range.
Instead, have you considered an approach that works only with the subset of latents that actually activate for a given input? For example, if latents #2, #17, and #103 are the only ones that activate for a particular input, could you calculate cumulative reconstruction losses where first only latent #2 must reconstruct, then #2 and #17 together, and finally all... (read more)
Thank you for the interesting article, I came here after reading Noa Nabeshima's implementation. I have a question about an alternative approach to your reconstruction loss calculation. In your current method, you divide latents into fixed nested groups, where each group's reconstruction loss is calculated using all activated latents within that group's range.
Instead, have you considered an approach that works only with the subset of latents that actually activate for a given input? For example, if latents #2, #17, and #103 are the only ones that activate for a particular input, could you calculate cumulative reconstruction losses where first only latent #2 must reconstruct, then #2 and #17 together, and finally all... (read more)