This seems like great work and I look forward to reading the rest of the paper soon, but one question that immediately came up is: how much did you explore and iterate on warm start data generation? It seems that the legibility and style of the final NLA's explanations would be highly sensitive to this warm start data – and perhaps the reconstruction loss could be as well. What sort of future work would you like to see exploring modifications to the warm start data generation, and how important do you think it is to explore this aspect further?
This seems like great work and I look forward to reading the rest of the paper soon, but one question that immediately came up is: how much did you explore and iterate on warm start data generation? It seems that the legibility and style of the final NLA's explanations would be highly sensitive to this warm start data – and perhaps the reconstruction loss could be as well. What sort of future work would you like to see exploring modifications to the warm start data generation, and how important do you think it is to explore this aspect further?