When I see people grading their predictions, it's always by: (a) bucketing their predictions by probability (into a "46-55%" bucket, a "56-75%" bucket, ...), and then (b) plotting each bucket's nominal probability vs empirical frequency-of-correctness. See e.g. Scott Alexander here.
This seems... fine... but the bucketing step has a certain inelegance to it: just as you can build many different-looking histograms for the same dataset, you can build many different-looking calibration curves for the same predictions, based on a semi-arbitrary choice of bucketing algorithm. Also, by bucketing datapoints together and then aggregating over the bucket, information is destroyed.
For histograms, there's an information-preserving, zero-degree-of-freedom alternative: the CDF. The CDF isn't perfect, but it at least has a different set of problems from histograms.
Is there any similar tool for grading predictions?