There are three broad types of approach I see for making interpretability rigorous. I've put them in ascending order of how much assurance I think they can provide. I think they all have pros and cons, and am generally in favor of rigor.
Some random extra context:
I have a bit of a reputation as a skeptic/hater of mechanistic interpretability in the safety community. This is not entirely unearned, I'm largely born out of an impression that much of the early work lacked rigor, and was basically a bunch of "just-so stories". Colleagues began telling me that this was clearly no longer the case starting with the circuits thread, and I've definitely noticed an improvement.