My definition of a "vague concept" (A):
- It creates a group of things that share family resemblance or "family difference".
- It doesn't have any meaning outside of context. But in a specific context it obtains specific meaning.
- It's very hard to understand through the framework of causation: what causes things to be A/non-A?
- When you consider more and more examples of A, the boundary between A and non-A becomes increasingly arbitrary and complicated. But in a specific context the boundary is simple.
Examples of vague concepts: games, health, human values, personality, words and some hypotheses.
I think it may be very important to study ways to think about vague concepts: it's related to hypotheses generation and human values. So it may be related to AGI and AI alignment.
I'll discuss some examples and share my thoughts about inner working of vague concepts.
An example about "games" is famous in philosophy. But keep in mind that my interpretation may differ:
It argues that things which could be thought to be connected by one essential common feature may in fact be connected by a series of overlapping similarities, where no one feature is common to all of the things. Games, which Wittgenstein used as an example to explain the notion, have become the paradigmatic example of a group that is related by family resemblances.
When you consider a bunch of "games", it's easy to see the common features. But as you consider more and more "games" and things that are sometimes called "games", it turns out that everything can be a game. The boundary between "games" and "non-games" becomes more and more arbitrary and complicated.
And yet no matter how much you stretch the concept (e.g. say something like "love is just a game"), in a specific context the meaning is clear enough. In a specific context the boundary between "games" and "non games" is quite simple. Does this boundary have a fractal dimension? (I'm half-joking)
I also think it's important to consider "family difference": maybe you have just a single object A and a group of objects (B). When you compare A to an object from (B), they're always different enough. But you can't formulate a universal difference between A and all objects from (B). Maybe because you don't know all the objects in (B).
Sorites paradox is at least partially relevant here.
A typical formulation involves a heap of sand, from which grains are removed individually. With the assumption that removing a single grain does not cause a heap to become a non-heap, the paradox is to consider what happens when the process is repeated enough times that only one grain remains: is it still a heap? If not, when did it change from a heap to a non-heap?
It is (presumably) easy to tell apart a heap from a non-heap. But it's hard/impossible to explore the boundary or analyze the question through the framework of causation ("What causes non-heap to become a heap?").
Health, moral values
You may also call vague concepts "cluster properties" (explanation in a Philosophy Tube video). In the text form:
Even more interestingly, Harris’ idea is an accidental ripoff of a theory developed by philosopher Richard Boyd in 1982, called: ‘The Homeostatic Cluster Property Theory of Metaethical Naturalism’ Sexy title. Boyd thought that words like ‘good’ and ‘evil’ refer to real properties out there in the material world, and that therefore statements like ‘Murder is bad’ are capable of being objectively true, or at least true in the same way as scientific statements are. Which prompts the question, “To what exactly do these words refer?”
Boyd’s answer is that they are cluster properties - groups of things that tend to go together. The example he uses is actually the same one Harris does - health. There are all kinds of things we would want to include in a definition of the word “healthy,” like your heart should be beating and you should be able to breathe, but do you have to be a certain size in order to be healthy? Do you have to not be in pain? Can you have a beating heart and be unhealthy? There’s a cluster of properties here somewhere that makes up the definition of the word health but we’re never going to pin down a definite list because that’s just not how the concept works. Despite that vagueness it’s still very obviously useful and meaningful.
Similarly Boyd thinks that a word like ‘good’ refers to a cluster of things that are non-morally good for humans, like sharing friendship, sharing love, having fun, watching quality YouTube videos, but just like with health, you’re never going to be able to pin down a full list because the concept just isn’t like that.
And here’s the big takeaway - if we say ‘John is healthy’ we could be talking about any number of things in the cluster of health - whether he a has disease, whether he works out, whether he has a good relationship with his mother - all of which are objective - but whether the sentence ‘John is healthy’ is true will still depend on what aspect of his health we’re talking about. It will be relative to the context in which we’re saying it.
You may even compare "vague concept" to "social constructs".
You can imagine a hypothesis based on vague concepts, for example "healthy people earn more money than unhealthy people" or "people who love games earn more money". In their most abstract form, those theories can't be falsified. But it's easy to generate specific falsifiable hypotheses based on those ideas.
Scientific theories, too, can have an unfalsifiable core. This is Imre Lakatos' model of scientific progress:
Lakatos's second major contribution to the philosophy of science was his model of the "research programme", which he formulated in an attempt to resolve the perceived conflict between Popper's falsificationism and the revolutionary structure of science described by Kuhn. Popper's standard of falsificationism was widely taken to imply that a theory should be abandoned as soon as any evidence appears to challenge it, while Kuhn's descriptions of scientific activity were taken to imply that science is most fruitful during periods in which popular, or "normal", theories are supported despite known anomalies. Lakatos' model of the research programme aims to combine Popper's adherence to empirical validity with Kuhn's appreciation for conventional consistency.
A Lakatosian research programme is based on a hard core of theoretical assumptions that cannot be abandoned or altered without abandoning the programme altogether. More modest and specific theories that are formulated in order to explain evidence that threatens the "hard core" are termed auxiliary hypotheses. Auxiliary hypotheses are considered expendable by the adherents of the research programme—they may be altered or abandoned as empirical discoveries require in order to "protect" the "hard core". Whereas Popper was generally read as hostile toward such ad hoc theoretical amendments, Lakatos argued that they can be progressive, i.e. productive, when they enhance the programme's explanatory and/or predictive power, and that they are at least permissible until some better system of theories is devised and the research programme is replaced entirely.
Vague concepts lead to vague hypotheses ("research programmes"). Vague hypotheses work the same way vague concepts do.
Even human personality may be a vague concept.
The smaller amount of situations you take, the easier it is to understand what someone's personality is. But if you take more and more situations it may turn out impossible to find a single connecting thing. Especially if someone lives long enough, encounters different enough communities, obtains different enough opportunities.
And even thinking itself may be a vague concept: before you realize some key connection your mind might be wandering between associations that have many overlaps, but don't form a coherent seamless picture.
Most of the words (their meaning) are vague concepts.
For example, the word "beast" may mean a fantastic creature, someone who lost their humanity, someone who's shockingly good at something and etc.
And of course there's the infamous/comical example with the word "shit" (ISMO skit), which changes its meaning in absolutely wild ways.
Exploring vague concepts
How to understand a vague concept? You can try to memorize all contexts (that you know of) in which it's used. Or you can learn to infer its meaning in new contexts. And learn to create new contexts for this concept yourself.
But it would require a different type of generalization, not based on definitions. I think it's important to explore what this "different type of abstraction" may be.
What are vague concepts made of? My ideas:
- They're made of something that can easily create contrasts.
- They can connect a fact (e.g. "someone is in pain") and a conclusion ("someone is not healthy") by a special type of connection.
Internal structure, "gradient"
Here's an observation: when the meaning of a word changes, this change doesn't come without a cost. (I'm not talking about words with completely separate meanings.) It comes with a change of emotion and emphasis. The point is that different meanings of a word have some internal relationships, "positions" relative to each other. If you could evaluate the "internal" meaning of a word at least in some simplified way, you would notice the relationships. Maybe those relationships allow us to really "understand" words and context.
For example, take the word "beast":
- When you use it in the archaic and ironic way ("any animal"), you describe the referent in the context of its world. You focus on the world (the world where everything may be a "beast").
- When you use it in the negative way ("brutal human"), you focus on actions and very deep qualities of the referent. E.g. you may imply that the referent is fundamentally rotten to the core.
- When you use it in the positive way ("very skilled human"), you focus on the referent's actions and qualities. But those qualities are not so deep anymore (compared to the negative meaning).
If you're interested in deep internal qualities of people/objects, the negative meaning may be the "main" one for you, even if you dislike it. This suggests an idea: if you have a certain bias or goal, internal relationships between meanings may become simpler. To understand the change of meaning of a word you only need to understand how your emotions changed and how the "alignment" between the word and your goal changed.
If you're interested in irony, the positive meaning of the word "shit" (e.g. "This is the shit") may be the main one for you.
Emotions, biases and goals give the meanings of a word some "gradient" or "flavors". I recommend to check out "Propagating Facts into Aesthetics" by Raemon, it may be very relevant.
My conclusion is that some vague concepts may create "meta contrasts", "contrasts of contrasts": they contrast the real world and a counterfactual world, but also contrast multiple possible emotions/goals related to those worlds. For example, when something bad happens and you think "this is a bad day" ("bad day" is a vague concept, there's almost no real boundary between good and bad days), you put the bad outcome in context of your possible goals and emotions related to this day. And at this point it doesn't really matter what the bad thing was, what matters is how your emotions and goals where affected. The same thing with your overall "health". It's not about specific medical conditions, it's about your feelings and goals being (un)affected. And when you call someone a "beast" (in the positive sense) you talk about your emotions and compare the referent to other people.
- The physical cause of your emotional change may trigger multiple vague concepts
- You can't know your exact emotions
so, "physical" and "emotional" ("gradient") aspects of a meaning may get really entangled. I think this is what makes defining some vague concepts really impossible.
I also tried to explore vague concepts in more detail on a specific toy example. I know that my post turned out to be very unclear, but there you can find:
- Objects connected by family resemblance and "family differences": "places" with "colors".
- Objects that don't have any definite properties outside of context, but obtain specific properties in a specific context: "places".
- Connections that are hard to understand through the framework of causation: what exactly causes a certain place to have a certain color?
- Strange boundaries. When you consider more and more places, the boundary between places of different colors becomes increasingly arbitrary and complicated. But when you compare specific places the boundary is simple.
- A model of "context" and "meaning" for a toy example.
- An example of a "gradient": density of "details" in a place.
I think the toy example in the linked post could help to come up with some statistics for "vague concepts".
I think we maybe shouldn't reify "concepts" at all. To some extent it seems to me like there are only patterns of internal and external behavior. Certain stimuli - experiences, feelings, trains of thought - lead to certain sequences of words being produced. The question should not be "what does the word mean?" but rather "what does this word having been used say about the internal state of the one using it, in this context?" Words don't have fixed meanings - they are entirely contextual.
Concepts are not things that actually exist as stable units - our usage of a finite set of words makes us think that they do, but in reality every pattern of neural activations is unique and we reach for the words that seem the least far away in some vector space (generated by training on our past experience of the same words in other contexts) to the position of the thought we actually want to express. Every such action moves the positions of those words a little bit in other people's semantic spaces, leading to semantic drift - something like a rise in entropy, a gas expanding through a room.
Probably I didn't fully understand your position, but here are my thoughts and emotions about this topic:
This might be a bridge between machine learning and agent foundations that is itself related to alignment. A vague concept could be expressed by a model of machine learning that presents behaviors in a large diverse collection of specific episodes it can make sense of (its scope), exercising its influence as it decides according to its taste.
The machine learning point of view is that we are training the model using all the episodes as the dataset (or maybe for reinforcement learning), with the other things defining the episodes (besides the model itself) giving the data that the model learns. The decision theory point of view is that the model is an adjudicator between the episodes, a shared agent of acausal coordination that intervenes in all of them jointly, that the model presents a single policy that is the updateless decision to arrange all episodes in one possible way, as opposed to other possible ways.
The alignment point of view is that a concept expresses a tiny aspect of preference in its decisions, with its role as an adjudicator giving consistency to that aspect of preference, and coherence to decisions of an agent that relies on the concept. It acts to extend the goodhart scope of the agent as a whole to the episodes that the concept can make sense of. The scope of a concept should be in an equilibrium with its content (behavior), as settled by reflection of learning from the episodes where the concept acts/occurs. Different concepts interact in shared episodes, where their scopes intersect, jointly supplying all the data that makes an episode (when it doesn't originate as observations of reality, grounding the whole thing). Concepts in this sense are both a way of extending the goodhart scope and of remaining aware of its current locus.
Sorry if I'll dumb it down too much. I tried to come up with specific examples without terminology. That's how I understand what you're saying:
In this case, could you help me with the topic about "colors"? I wouldn't write this post if I didn't write about "colors". So, this is evidence (?) that the topic about "colors" isn't insane.
There a "place" is a vague concept. "Spectrum" is a specific context for the place. Meaning is a distribution of "details". Learning is guessing the correct distribution of details ("color") for a place in a given context.
I mean, the sketch I've written up (mostly here, some bargaining related discussion here) is not very meaningful, it's like that "Then a miracle occurs" comic. You can make up such things for anything, it's almost theoretical fiction (not real theory) in a sense analogous to that of historical fiction (not real history). It might be possible to build something out of this, but probably not, like you don't normally look for practical advice in books of fiction, even though it can sometimes happen to be found there. That's not what books of fiction are for though, and there could be a scene for self-aware theoretical fiction writers.
I think the interesting point of the sketch is how it naturally puts models of machine learning in the context of acausal decisions from agent foundations, the points of view that are usually disjoint in central examples of either. So maybe there is a way to persist in contorting them in each other's direction along these lines, and that prompted me to mention it.
I've written in response to this post because your definition of vague concepts (at the beginning of the post) seems to fit adjudicators pretty well. In the colors post, there are also references to paradigms, which are less centrally adjudicators, but goodhart scope is their signature feature (a paradigm can famously fail to understand/notice problems that are natural and important for a different point of view).
This post about vague concepts in general is mostly meaningless for me too: I care about something more specific, "colors". However, I think a text may be "meaningless" and yet very useful:
Did we achieve anything? I think we could have. If one of us gets a specific insight, there's a chance to translate this insight (from A to B, or from B to A).
I just tried to understand (without terminology) how my ideas about "vague concepts" could help to align an AI. Your post prompted me to think in this direction directly. And right now I see this possibility:
The most important part of my post is the idea that the specific meanings of a vague concept have an internal structure. (at least in specific circumstances) As if (it's just an analogy) the vague concept is self-aware about its changes of meaning and reacts to those changes. You could try to use this "self-awareness" to align an AI, to teach it to respect important boundaries.
For example (it's an awkward example) let's say you want to teach an AI that interacting with a human is often not a game or it may be bad to treat it as a game. If AI understands that reducing the concept of "communication" to the concept of a "game" may bear some implications, you would be able to explain what reductions and implications are bad without giving AI complicated explicit rules.
(Another example) If AI has (or able to reach) an internal worldview in which "loving someone" and "making a paperclip" are fundamentally different things and not just a matter of arbitrary complicated definitions, then it may be easier to explain human values to it.
However this is all science fiction if we have no idea how to model concepts and ideas and their changes of meaning. But my post about colors, I believe, can give you ideas how to do this. I know:
But it may give ideas, a new approach. I want to fight for this chance, both because of AI risk and because of very deep personal reasons.
"Adjudicator" is a particular role for agents/policies, and the policies (algorithms that run within episodes) are not necessarily themselves agents (adjudicator-as-agent chooses an adjudicator-as-policy as its decision, in the agent foundations point of view). There is also an "outer agent" I didn't explicitly discuss that constructs episodes on situations, deciding that certain adjudicators are relevant to a situation and should be given authority to participate in shaping or observing the content of the episode on it. This outer agent is at a different level of sophistication than the adjudicators-as-policies (though not necessarily different from adjudicators-as-agents), and is in a sense built out of the adjudicators, as discussed here.
So I think the use of "agent" in the first point I quoted is about adjudicators, in the second point both adjudicator and outer agent fit (but mean different things), and the third point is about the outer agent (how its goodhart scope relates to those of the adjudicators).
Vague concepts are a...vague concept.