How feasible/costly would it be to train a very large AI model on distributed clusters of GPUs? — LessWrong