Converging to Multi-Modal Generative AI — LessWrong