Google's PaLM-E: An Embodied Multimodal Language Model — LessWrong