Geometric Features for AI Uncertainty: A Targeted Tool for Safety-Critical Regions
Most AI uncertainty metrics give a single, average score. They tell you the model is unsure, but not where or why it's unsure in a way that matters for deployment. What if we had a probe that was specifically sensitive in the exact regions where models are most likely to fail? My recent work, "Geometric Safety Features for AI Boundary Detection", tries to build exactly that. The Core Idea: Geometry as a Probe for Borderline Cases Instead of a monolithic uncertainty score, I extracted seven simple geometric features from a model's embedding space using k-NN—things like the standard deviation of distances to neighbors (knn_std_distance). The hypothesis: the "shape" of the local neighborhood reveals how precarious a model's prediction is. To test this properly, I had to move beyond average-case metrics. I introduced a "boundary-stratified evaluation" framework, splitting test cases into "Safe," "Borderline," and "Unsafe" zones based on the model's own confidence. The critical question: do these geometric features provide disproportionate signal exactly where the model is teetering on the edge? The Results: Specificity Over Broad Improvement The findings were more targeted and interesting than a blanket accuracy boost: · 4.8x Larger Improvement on Borderlines: The geometric features improved explanatory power (R²) by +3.8% on Borderline cases, versus only +0.8% on Safe cases. They provide the most lift precisely where standard uncertainty measures are least reliable. · Predicting Behavioral "Flips": In a separate experiment where we paraphrased inputs to try and flip the model's decision, a classifier using these geometric features achieved an AUC of 0.707, outperforming a baseline using only boundary distance by 23%. · A Clear Signal Emerges: The top-performing feature was consistently knn_std_distance—the spread of a point's nearest neighbors. Its correlation with uncertainty was amplified in borderline regions (r=+0.399) compared to the overall average (r=+0.286