I think that the distribution is mostly irrelevant to the problems and purpose of education systems. Public, largescale, youth education is mostly about childcare and socialization, and only incidentally about skill or knowledge development. Outliers, regardless of the distribution or percentage, aren't particularly wellserved.
Public, largescale, youth education is mostly about childcare and socialization, and only incidentally about skill or knowledge development.
I agree that childcare and socialization are big parts of it, but I also think skill and knowledge development play a big role. For example, I care about my doctor’s education due to the skill and knowledge development (as well as certification) that happened during their formal education.
People such as voters and parents also care at least to some degree what people learn in school. They might be mistaken a lot of the time, but they do care.
I think there is a crucial difference between performance, as defined in the paper, and ability which should be taken very much into account. I will not debate if their definition of performance is consistent or not with the common usage, but they failed to state their definitions clearly and I think you misunderstood their results because of this.
The paper measures performance as the results of (roughly) zerosum competitions. This is very clear when they analyze athletes (number of wins), politicians (election wins, reelections) and actors (awards). But this is also true for research, as writing an impactful paper means arriving at a novel result before competing teams or succeeding at explaining something where other have failed.
But, for a professional runner, winning 90% of races is not the same as being 90% faster. Indeed, a runner who is on average 5% faster will win most races (not all, as he will have off days where his speed goes down by more than 5%).
Tests such as PISA and grades try to measure ability, e.g. your math skill. That is analogous to a runner's speed, not to how many races he wins. I believe this is very much Gaussian distributed, and the paper does not show anything to the contrary. Indeed it is very reasonable to believe that Gaussian distributed abilities result in Pareto distributed outcomes in competitive situations (it may be a provable result but I'm too lazy to do the math now). So, it's pretty much appropriate to give grades on a Gaussian.
Now, we could debate if productivity comes mostly from exceptional performers in the real world, which might result in similar reform ideas. BTW, that's something I mostly don't believe but it's a tenable position on a very complicated issue.
I think that's very important to note, thank you! In fact, the two measures may be quite related  it's believable that pairwise comparisons across a normal distribution along with some noise (most of these are small numbers of contests) can look a lot like a power law (without the asymtotic crazylarge values).
But really, the tie between education and ability or performance is pretty tenuous in the first place, so we shouldn't take any policy recommendations from this mathematical curiosity.
Thanks for the insightful comment. I agree that the performance measures used tend toward zerosum games. I don’t, however, think that research is an example of a (roughly) zerosum game. Scientific breakthroughs to be made is not a limited resource in anywhere near the same sense as sports trophies is a limited resource. When we’re counting papers, we’re getting closer to zerosum, but I still think it’s significantly positivesum.
Leaving that aside, I still think we need more examples from positivesum games. We could look at things like
Maybe zerosum was not the right expression, because I think it is broader than strictly zerosum games. I meant winnertakesmost situations, where the reward of the best performer is outsized with respect to the reward of the nextbest. This does not necessarily mean that the game is strictly zerosum. In many cases, it is just that the product you deliver is scalable, so everyone will just want the best product (of course, preferences may mean that the ranking is not the same for everyone).
I am also convinced that all the things you mentioned have a fat tail, even if they don't follow strictly a Pareto distribution (probably books/records will be the most close to Pareto, salaries the most close to a Gaussian but with a fat tail on the right). But I think this does not reflect the distribution of quality/skill but the characteristics of the markets.
Example: book sales. I like fantasy books, but the number of books I read per year is capped. So there are a few authors I follow, plus maybe once per year I look for reviews and check if some good book by other authors has come out. If a certain book I would read is not released, chances are I would read the next best one, and ...
Providing oneonone tutoring to highly intelligent children should be considered by the effective altruism community in part because many members of this community would themselves be qualified to be such tutors.
Returns on performance being pareto distributed is emphatically NOT the same thing as performance being pareto distributed.

The easiest way to deal with the smart outliers is to remove the speed limit as you suggest.
I can't find the american report I read years ago about acceleration, but the conclusion was that grade skipping's benefits almost always overwhelmed the drawbacks. In particular, socialisation does not always degrade after skipping, it might actually improve. Grade skipping has the advantage of being totally free (actually saving money for everyone involved including taxpayers) and applicable today.
TL/DR: Grade skipping is a low hanging fruit.
If ability in the underlying population is normally distributed, competition for jobs should still leads to people from the right side of the normal distribution ending up in the relevant jobs, and people from the left side of the normal distribution not getting the jobs. If we now measure the performance of people with the jobs, shouldn't we expect the graph to look like the right side of a normal distribution, which looks more like a pareto distribution than an entire normal distribution?
So surely finding that the performance of employees in a field looks more like a pareto distribution than a normal distribution doesn't demonstrate that individual performance at the population level is more like a pareto distribution than a normal distribution?
I agree that in a perfect world, you could progress in each subject individually. It simply does not make sense to say "we will not allow you to learn more math, because you suck at history" (or vice versa).
This does not necessarily imply that minimal requirements need to be removed. I could imagine a school that insists on you attending the subjects you suck at... without preventing you from simultaneously studying other subjects at a higher level.
As Dagon said, the exact distribution is mostly irrelevant for this argument. In a world where skills are normally distributed, the same reform would still be an improvement.
The elephant in the classroom is childcare. Most parents need it. A few don't. If you provide mass childcare, it makes sense to provide some education at the same time. If you don't need the childcare, the forced coupling of childcare and education is annoying. Maybe we should decouple childcare from education, starting by decoupling teaching from certification  if children are tested by an external institution, it makes it easy to also test homeschooled children fairly using the same system. (Note: by supporting homeschooling you also support all kinds of experiments in education, which can formally pretend to be homeschooling. This in my opinion is even more important than homeschooling as such.) And if testing is external, it also makes it easy to test each subject at individual speed.
I agree that the positive extremes matter in education. The person who in future invents the cure for cancer, should be allowed to progress as quickly as possible, without being artificially slowed down to the level of the average student; even the average student would benefit from this rule.
In The best and the rest: Revisiting the norm of normality of individual performance (2012), O’Boyle and Aguinis show that individual performance follows a Paretian distribution:
Currently, systems of formal education assume that individual performance is normally distributed. For example, in all countries that I know of, university grades have a strict upper bound and are at least roughly normally distributed. The PISA tests are another example. Following the release of PISA results, “most public attention concentrates on just one outcome: the mean scores of countries and their rankings of countries against one another.”
If it is true that individual performance is Pareto distributed, how should we reform education?
An answer: Decouple age from level and have very lax minimum requirements
Here is my (certainly not original) answer: Decouple age from level and have very lax minimum requirements. Structure schools so that students can progress at their own pace in different subjects. Crucially, make it possible to progress extremely quickly in as little as one subject.
Let’s say that you’re a math prodigy. We will let you concentrate on math as much as you want, ignoring other subjects. You could start making original contributions to mathematics years earlier, greatly increasing the time you have available for advancing the field. We would allow you to proceed to university without knowing how your country’s political system works.
Instead of worrying about the mean, we would let students follow completely different paths:
I could say a lot more about this idea, but I’ll leave it at that. What other ideas should we consider?