It's always troubled me that the standard Turing test provides only a single-bit output, and that the human being questioned could throw the game to make their AI counterpart look good. Also, research and development gets entirely too much funding based on what sounds cool rather than what actually works. The following is an attempt to address both issues.
Take at least half a dozen chatbot AIs, and a similar number of humans with varying levels of communication skill (professional salespeople, autistic children, etc.). Each competitor gets a list of questions. A week later, to allow time for research and number-crunching, collect the answers. Whoever submitted question 1 receives all the answers to question 1 in a randomized order, and then ranks the answers from most human/helpful to least, with a big prize for the top and successively smaller prizes for runners-up. Alternatively, interrogators could specify a customized allocation of their question's rewards, e.g. "this was the best, these three are tied for second, the rest are useless."
The humans will do their best in that special way that only well-paid people can, and the chatbots will receive additional funding in direct proportion to their success at a highly competitive task.
Six hundred thousand seconds might seem like an awfully long time to let a supercomputer chew over it's responses, but the goal is deep reasoning, not just snappy comebacks. Programs can always be debugged and streamlined, or just run on more powerful future hardware, after the basic usefulness of the results has been demonstrated.