Consider generating half of a Turing test transcript, the other half being supplied by a human judge. If this passes, we could immediately implement an HCH of AI safety researchers solving the problem if it's within our reach at all. (Note that training the model takes much more compute than generating text.)
This might not be the first pivotal application of language models that becomes possible as they get stronger.
It's a source of superintelligence that doesn't automatically run into utility maximizers. It sure doesn't look like AI services, lumpy or no.