Well, that at least is an experiment one could set up. Time of reaction should probably be a reasonably-appropriate measure for "harder" (perhaps error rate, too, but on many tasks error rate is trivially low). But this requires to determine how "using a function" is detected; you'd need, at the very least, "clear cases" for each function.
Oh, there are many. One, MBTI supposes the functions are antagonistic in very specific ways, so null hypothesis is absence of those antagonistic pairings even if the functions themselves are as it says. Two, each cutting out of a function is a subhypothesis of clustering the thingspace (in this case, cognitionspace), and the null hypothesis is that it doesn't cut at reality's joints.
Offer an alternative hypothesis. "A fair fight", as HPMoR puts it. To understand if it's valid, you need to be able to imagine both a world in which it is and a world in which it isn't and outline what the differences would be.
It's a great post in that it seemingly tries to engage with the question in true faith. That said…
We don't ask people on how they come to their datapoints because we can't trust their answers. That kind of introspection is deeply unreliable in most people, they (we) aren't, in this respect, enough of a lens that can see its flaws. That's why Big Five questions skipped that, not by careless omission as your post seems to imply. The MBTI-type cognitive function gears would be "big if true", but most big if true models are wrong, and not just in a technical sense of "all models are wrong, some models are useful", but in failure to properly connect to reality by providing wrong compressions; the post provides literally no arguments for why these are useful gears.
That [organizational] culture can and does change.
You asked to notify you for things the previous texts failed to lay the groundwork, so here it is. The previous discussion largely looked as if it's static and self-supporting, aside from a couple of examples of how organizations jumped to being mazes as they grew. I feel like this is partially related - but distinct from - this, but getting your own perspective on when it can vs. can't change (not in the case of heroic efforts where you basically uproot everything because I presume that's not what you meant by this) could be useful.
That's an ingenious solution! I still feel like there's some catch here but can't formulate it. Maybe because it's way past midnight here and I should just go to sleep.
"Can you try passing my ITT, so that I can see where I've miscommunicated?"
...is a very difficult task even by standards of "good discourse requires energy". To present anything but a strawman in such case may require more time than the general discussion - not necessarily because your model actually is a strawman but because you'd need to "dot many i's and cross many t's" - I think that's the wording.
(ETA: It seems to me like it is directly related to obeying your tenth guideline.)
However, just in case, you only covered my first suggestion, not both.