This is just an idea that came to me today. I didn't do any literature search and don't know if it's new.
The "depth" of a game is the length of the longest chain of players where each one beats the next e.g. 60% of the time. Games with more depth are supposed to involve more knowledge, strategy, learning and so on. For example, chess has about double the depth of checkers.
But I don't think that's right. Consider these games:
Deca-chess: two players play 10 games of chess against each other, and whoever wins more, wins.
Coin-chess: a three-sided coin is flipped. Heads -> player 1 wins, tails -> player 2 wins, wings -> a game of chess is played and whoever wins wins.
Under the above definition, deca-chess is "deeper" than chess, because the better player has higher chance to win the whole match than to win a given game. And coin-chess is more "shallow", because the better player has less of an edge. Even though the amount of knowledge, strategy and learning involved in these games is exactly the same!
The same problem happens with games like poker. Even though poker involves a lot of skill, each round has so much randomness that the depth comes out even smaller than checkers. We could introduce repeated rounds like in deca-chess, but how many? It seems arbitrary.
Can the concept of game depth be fixed, made independent from repeated trials and luck? I think yes, by introducing another variable: effort.
Imagine an individual player, let's call him Bob. Have Bob play in a tournament where every match has many rounds, to rule out luck. But in some matches tell Bob to play half-heartedly, and in others tell him to use maximum effort. (Ignore the unreality of the hypothetical, I'm trying to make a point.)
By the end of the tournament we'll have all players arranged by skill, and also know which sub-ranges of skill are covered by individual players' ranges of effort. In other words, we'll end up knowing this: "The difference between the worst and best player in the tournament is covered by about N intervals between a player's slack and best effort at each level".
That number N could be used as a new measure of game depth. Under this measure, chess, deca-chess and coin-chess will be equally deep, and checkers will be less deep. Intuitively it makes sense: "the best player surpasses me by about 5x the difference between my slack and my best effort". It lets you feel how much work is ahead of you. (Measuring against the worst player would be less informative, but everyone only cares about their distance to the top, so that's ok.)
The same idea could also work for test scores. I'd be curious to apply it to something like Raven's matrices: draw a histogram with test score as X and number of people as Y, and renormalize the X axis so that a distance of 1 corresponds to the difference between slack and best effort for a typical person at that level. Then when a new person takes the test or plays the game, we match them against a database of previous results, and tell them "ok, you performed at X units of maximum effort above the median person".