Posts

Sorted by New

Wiki Contributions

Comments

I greatly appreciated the time invested in coding the interactive demos, they help clarifying the insights into the underlying concepts – it reminds me of Colah's posts.

Questions:

  • Are you going to release tools for the interpretation of other models?
  • How might one visualize other modalities? like audio or web actions?
  • Have you considered developing a generalized interpretability framework that could scale these techniques across different architectures and modalities? A unified "interpretability platform" could help broaden access and grow a dedicated community around your work.