I claim that the ambiguity that we care about (ambiguity identification, robustness to distributional shift), is secretly "uncertainty of generalization" in perfect disguise (said in another way, "How unsure are you that your current knowledge applies to situation x?"). To better understanding that, we'll be formalizing Ideal Generalization (IG*).

Generalization will be defined as one "relationship between inputs" predicting another "relationship between inputs". I'll shorten "relationship between inputs" as an "R-set" for succinctness.

Examples of R-sets include: "a few pixels", "a picture of the number 9", "all possible 9's", "you reading 'all possible 9's'", and "the history of our universe up until now".

# Rough Overview

Specific: If pictures - map to “dog”, then describe those pictures as R-sets, find the common R-sets between them. Then, when you come across another picture, you can check if it has those common R-sets that map to “dog”. Repeat for all possible R-sets.

General: If - all map to , then describe those ’s as R-sets, find the intersection between those R-sets, then if you come across another , you can see if it has the common R-sets that map to . Repeat for all R-sets, -.

# IG Setup

To begin with, every possible R-set is predicted by every possible R-set. If we define R as the set of every possible R-set (every possible situation), then *a,b *∈R×R is a more formal representation with *a* being the predictor R-set and *b* being the predicted R-set (with being the set of every possible generalization).

The total space of possible R-sets is very large, so the amount of probability assigned to each R-set is , initially. This is good because we'd like to be maximally uncertain when we know nothing (at the beginning); however, this places all possible R-sets on equal footing when some are more likely to predict R-sets we find in our world due to Occam's Razor.

Borrowing from Solomonoff Induction, we can multiply each R-set's probability by with *L* defined as the length of the tape required to produce that R-set on a universal Turing machine. After normalizing, we're left greatly biased towards simpler R-sets.

Now that we're set up, we can answer two interesting questions relevant to generalization:

1. How do I predict a specific R-set?

2. Given a specific situation (a specific R-set), what R-sets are predicted?

## 1. Predicting a Specific R-set

Given a specific R-set, , we have R-sets − predicting . In other words, given *a,b* ∈R×R, we're looking at a specific *b* being predicted and all *a*’s which claim to predict it. As input is received, if an R-set predicts incorrectly, it's probability is set to 0 and the rest is re-normalized. This process narrows down possible R-sets in two ways: eliminating specific R-sets & eliminating general R-sets.

More specific R-sets are eliminated when encountering a positive example (seeing ). A specific R-set claims "We will only see in situation and nowhere else". When is seen, a large batch of specific R-sets are eliminated since was seen when it claimed it wouldn't.

More general R-sets are eliminated when encountering a negative example (when R-set isn't held). A general R-set claims "We will see Rx in situation ". When is not seen, a large batch of general R-sets are eliminated since they claimed it would see in that situation.

*Ex. If you're already trained in recognizing dogs from pictures, and you're trying to recognize cats, a large amount of possible R-sets can be eliminated by being told "A dog is not a cat", namely, those R-sets that predict a dog.*

*Also, this will happen without being told explicitly "it's not an X" when it sees the relationship between inputs, , not being held. This means learning one set of R-sets teaches you what another set of R-sets are not, which means learning is transferable even when learning dissimilar types. *

## 2. Given a specific situation (R-set), what R-sets are predicted?

Given an R-set, , we can check that R-set's probability for every R-set -. In other words, given *a,b* ∈R×R, we're looking at a specific* a* and looking at the probabilities it predicts for every *b*. Most probabilities will be ~0%. In fact, all of them will be ~0% in the beginning, but as enough input is received, some R-sets will rise in probability, and those will be the R-sets predicted with an exact value of certainty attached.

# Predicted Successes

Considering Dreyfus’s US/Soviet tank classification, the failure was in converging to the easier classification of lighter/darker pictures. IG would maintain both R-sets (US & light) and would be ambiguous (less confident) if it encountered a US/dark picture due to a splitting of probabilities assigned to both R-sets. IG would also be able to preemptively ask for (or create) training data that splits probability mass to converge to the correct R-sets quicker. (In general, asking humans for a split R-sets example isn’t guaranteed to be transparent/ understandable, so extra work needs to be done to understand transparency better).

Considering the __2,4,6 / guess the rule__, IG would generate the simplest R-sets that fit the bill, and, with extra coding, would be able to generate extra examples that fit some R-sets and exclude others in order to more quickly converge to the correct rule.

# Relation to Solomonoff Induction (SI)

This formalization is similar in the simplicity prior, though, where SI measured the simplicity of algorithms, IG measures the simplicity of algorithms that produce a given R-set. Another similarity is how it narrows down the hypothesis space by making predictors compete and picking the best predictors, though, where SI only predicts the next bit, IG predicts every possible R-set.

SI places a black box around "All possible algorithms", and IG opens up that box by converging to a good algorithm (one that predicts well).

# Relation to ambiguity identification

Again, I claim that ambiguity identification is equivalent to uncertainty about generalization. That ambiguity identification can be phrased as asking “Using my current knowledge, does this situation predict all R-sets with confidence below *c*%?*”* (because confidence below a specified threshold would mean it's ambiguous), and specifically that the question “Have I encountered a distributional shift” can be asked as “Does this situation predict *relevant* R-sets with confidence below *c*%?” (with “relevant” meaning the R-sets you care for in that situation such as “This is a dog/cat” in a dog/cat classifier).

# Problems/Uncertainties

Unsure if ideal generalization is perfectly captured by this formalization. I feel like IG captures it, but it might be too broad and potentially black-boxes interesting structure in generalization. The simplicity prior narrows it down to R-sets that better generalize; however, I’m not confident that this is the only narrowing down that we can do.

Unsure if simplicity prior works as intended. One way to test would be to have several different R-sets that we know predict our world to varying degrees and see if the complexity correlates. On first thought, this seems difficult to isolate without properly investigating what “predicting our world to varying degrees” means, though I do know that this is different (maybe orthogonal) to how general/specific a R-set is. Simplicity is more of the "shape" of R-sets predicted than the number of R-sets predicted. With that metaphor, I'm asking is the "shape" of simple, algorithmically-produced R-sets roughly the same "shape" of R-sets we see in our world?

Unsure if this formalization breaks when given faulty information by an adversary (or just by insufficient sensors). For example, if you think you saw , when in reality didn't hold, then you're setting the more accurate R-sets' probabilities to 0; however, "all possible R-sets" includes adversaries and faulty sensors as possibilities so this may be taken care of.

# Conclusion

IG begins with absolute uncertainty and a simplicity prior, narrows down the hypothesis space for every possible R-set as that R-set held/not held, and is able to assign a probability to every R-set a specific situation predicts.

It would be interesting if ambiguity identification is captured by generalization uncertainty, and if this formalization captures ideal generalization.

I personally find it interesting that positive and negative examples eliminate specific and general hypotheses, respectively. This is most certainly researched before, but it was nice to derive it intuitively for myself.

## Future work

- Transparency of R-sets/ hypotheses
- Generate R-sets given a data set with proven guarantees that the correct R-set is included (Or one that predicts within a bound of accuracy)
- Investigate simplicity of R-sets vs the “shape” of R-sets we see in our world
- Read through the previous research cited by Amodei in Concrete Problems regarding ambiguity identification (Specifically active learning which has similar results, but is tractable)
- Create a more tractable version. 99% of R-sets are noise, so my first thought is to only focus on the simplest R-sets and successively consider more complex R-sets as they fail to predict.
- Read MIRI's relevant papers on the subject (On first glance, I see none. Pointers to those on agentfoundations.org/elsewhere would be welcome)
- Research papers that assign probabilities to logical statements

*My inner heckler wants to say "IG? More like GI 'cause this is full of crap!"