We hope to borrow much ideas from the cogsci work, where mental models between people (e.g., co-working, teacher/student situations) are well studied. This work that we cite may give a good idea of the flavor: https://langcog.stanford.edu/papers_new/goodman-2016-tics.pdf or https://royalsocietypublishing.org/doi/abs/10.1098/rsta.2022.0048. In other words, cogsci folks have been studying how humans work together to understand each other to work better together or to enable better education, and the agentic interpretability is advocating to do something similar (tho it may look very different) with machines.
We hope to borrow much ideas from the cogsci work, where mental models between people (e.g., co-working, teacher/student situations) are well studied. This work that we cite may give a good idea of the flavor: https://langcog.stanford.edu/papers_new/goodman-2016-tics.pdf or https://royalsocietypublishing.org/doi/abs/10.1098/rsta.2022.0048. In other words, cogsci folks have been studying how humans work together to understand each other to work better together or to enable better education, and the agentic interpretability is advocating to do something similar (tho it may look very different) with machines.