Okay, so maybe you could say this.

Suppose you have an index I. I could be a list of items in belief-space (or a person's map). So I could have these items (believes in evolution, believes in free will, believes that he will get energy from eating food, etc..) Of course, in order to make this argument more rigorous, we must make the beliefs finer.

For now, we can assume the non-existence of a priori knowledge. In other words, facts they may not explicitly know, but would explicitly deduce simply by using the knowledge they already have.

Now, maybe Person1 has a map in j-space with values of (0,0,0.2,0.5,0,1,...), corresponding to the degree of his belief in items in index I. So the first value of 0 corresponds to his total disbelief in evolution, the second corresponds to total disbelief in free will, and so on.

Person2 has a map in k-space with values of (0,0,0.2,0.5,0,0.8, NaN, 0, 1, ...), corresponding to the degree of his belief in everything in the world. Now, I include a value of NaN in his map, because the NaN could correspond to an item in index I that he has never encountered. Maybe there's a way to quantify NaN, which might make it possible for Person1 and Person2 to both have maps in the same n-space (which might make it more possible to compare their mutual information using traditional math methods).

Furthermore, Person1's map is a function of time, as is Person2's map. Their maps evolve over time since they learn new information, change their beliefs, and forget information. Person1's map can expand from j-space to (j+n)th space, as he forms new beliefs on new items. Once you apply a distance metric to their beliefs, you might be able to map them on a grid, to compare their beliefs with each other. A distance metric with a scalar value, for example, would map their beliefs to a 1D axis (this is what political tests often do). A distance metric can also output a vector value (much like what a MBTI personality test could do) to a value in j-space. If you simply took the difference between the two maps, you cold also output a vector value that could be mapped to a space whose dimension is equal to the dimension of the original map (assuming that the two maps have the same dimension, of course).

Anyways, here is my question: Is there a better way to quantify this? Has anyone else thought of this? Of course, we could use a distance metric to compare their distances with respect to each other (of course, a Euclidean metric could be used if they have maps in the same n-space.

==

As an alternative question, are there metrics that could compare the distance between a map in j-space with a map in k-space (even if j is not equal to k)? I know that you have p-norms that correspond to some absolute scalar value when you apply the p-norms to a matrix. But this is sort of difference. And could mutual information be considered a metric?

Have you heard of the Kullback-Leibler divergence? One way of thinking about it is that it quantifies the amount you learn about one random variable when you learn something about another random variable. I.e., if your variables are X and Y, then D(p(X|Y=y),p(X)) is the information gain about X when you learn Y=y. It isn't a metric, as it isn't symmetric: D(p(X|Y=y),p(X)) != D(p(X),p(X|Y=y)). Nevertheless, with two people with different probability distributions on some underlying space, it's a good way of representing how much more one knows than the other.

As jimrandomh says, the representation of beliefs that you use isn't very practical. However your question is a good one, as it applies whatever representation you use.

Your comment about taking emotional salience into account is leaving the realm of probability and epistemic rationality - I'm less familiar with what tools are available to formalize differences in what's valued than I am with tools to formalize differences in what's known.

Ah okay, thanks for the reply. Yes, I've heard about the KL divergence, although I haven't really worked with it before.

"I'm less familiar with what tools are available to formalize differences in what's valued than I am with tools to formalize differences in what's known."

Oh, good points. LessWrong is more concerned with what's known than what's valued. Although what's valued does matter, since what's valued is of relevance when we want to operationalize utility.

The traditional distance metric for vectors like that is angle - that is, you take the dot product of the two vectors, divide by the product of the lengths, then inverse-cosine the result, with 0 being maximally similar and pi being maximally dissimilar. 1D scales like political orientation are also represented as vectors, to which you measure a similarity; and you find interesting scales by putting a bunch of peoples' belief vectors in a matrix and doing singular value decomposition. You weight the terms by scaling the values.

This similarity metric is very widely used in search, where a (sparse) vector has an entry in it for each term in the lexicon, and the value is the number of times that term appears.

However, I'm not sure a vector is really the right way to model peoples' maps; there are a lot of things you can't represent that way, or that require truly enormous vectors to represent that way. Many ontologies use graphs, instead - you have nodes for various concepts, and edges between them representing relationships. In that case, you would measure similarity using edit distance: the length of the shortest sequence of insertions and deletions that would make one map look like the other. However, while there are efficient algorithms for edit distance between strings, I don't know whether there is one for edit distance between graphs, or if that's even in P.

If we have the map be a list and the entries be probabilities, we'd need to somehow represent the difference between someone who believes in lots and someone who believes in very little, even if they have the same angle.

Something like chi-squared might work, but that would end up being dominated by the largest differences, which may be undesirable. But you can probably take roots until you get something that does what you want.

Not that you could compute it...

Maybe we should try to emulate what humans do when they assess how much they have in common with someone. Which is... uh... I don't know what we do. Probably something like a chi squared, but starting with broad categories and stronger inferences, and including weaker terms rarely or not at all.

Very interesting replies.

Hm, yeah that's true, we can't just represent probabilities. But also, we need a way to represent the emotional salience of the probabilities. That could probably be done by creating two new emotional salience vectors that correspond to each belief vector (appending the vectors might work, but would introduce complications in the metric calculations).