I think that it was actually already done with reasoning models. They type some tokens in the CoT, then proceed to write the actual answer which they show the users.
It's an interesting concept that some AI labs are playing around with. GLM-4.7 I believe does this process within it's <think> tags; you'll see it draft a response first, critique the draft, and then output an adjusted response. I frankly haven't played around with GLM-4.7 enough to know if it's actually more effective in practice, but I do like the idea.
However, I personally find more value in getting a second opinion from a different model architecture entirely and then using both assessments to make an informed decision. I suppose it all comes down to particular use case; there are upsides and downsides to both.
What I've noticed with current AI is that it acts like a people pleaser. What I mean is that it gives an answer without checking to see if it fits the question first. Humans do this a lot too. We give a quick answer and only after do we check in with ourselves to see if that answer was truly aligned with how we feel.
For me, it's a very different experience when I hear a question and I stop to feel into my answer before replying. This is called recursive thinking and it teaches a person, or LLM, how to check in before giving an answer.
The basic idea is, "Does this answer make sense with this question?" If yes, then it responds, if not, it stops. It treats the first output as a Draft, then runs a verification loop before the final Output. It’s like installing a 'conscience' or an ontological check into the chain just like when a human stops to check in with themselves before responding.
I'm curious what others think about this idea.