Why has this threshold been selected? This being a measure of the amount of time taken by a human to achieve the results 100% of the time, picking a 50% measure presents at least 2 problems:
(1) There may not be a direct linear connection between 50% achievement and 100%, such that it is valid to grade these two against each other directly (e.g. winning 50% of international soccer matches is orders of magnitude easier than winning 100% of them)
(2) The value of 100% achievement by humans may not reasonably be compared with 50% achievement in practical terms - if your car fails to start 50% of days you try, it isn't 50% less useful than a car that starts every time, it is orders of magnitude worse because the penalty for failure incurs additional costs beyond just the rework of trying again.
For these reasons, it may be that the only meaningful comparison is 100% achievement of equivalent human results, or that 50% achievements should be hugely discounted when comparing.
There also seems to be no comparison of quality considered - achieving a simple task in a complicated way which consumes more resources, or is less maintainable in future, less available to automated testing or deployment, etc etc all may incur additional future cost.
Why has this threshold been selected? This being a measure of the amount of time taken by a human to achieve the results 100% of the time, picking a 50% measure presents at least 2 problems:
(1) There may not be a direct linear connection between 50% achievement and 100%, such that it is valid to grade these two against each other directly (e.g. winning 50% of international soccer matches is orders of magnitude easier than winning 100% of them)
(2) The value of 100% achievement by humans may not reasonably be compared with 50% achievement in practical terms - if your car fails to start 50% of days you try, it isn't 50% less useful than a car that starts every time, it is orders of magnitude worse because the penalty for failure incurs additional costs beyond just the rework of trying again.
For these reasons, it may be that the only meaningful comparison is 100% achievement of equivalent human results, or that 50% achievements should be hugely discounted when comparing.
There also seems to be no comparison of quality considered - achieving a simple task in a complicated way which consumes more resources, or is less maintainable in future, less available to automated testing or deployment, etc etc all may incur additional future cost.