Do humans derive values from fictitious imputed coherence?

TsviBT

[Metadata: crossposted from https://tsvibt.blogspot.com/2022/11/do-humans-derive-values-from-fictitious.html. First completed November 1, 2022. This essay is more like research notes than exposition, so context may be missing, the use of terms may change across essays, and the text may be revised later; only the versions at tsvibt.blogspot.com are definitely up to date.]

Humans are born with some elements of their minds, and without many other elements, some of which they'll acquire as their life unfolds. In particular, the elements that we pretheoretically call "values"--aesthetic preferences, goals, life goals, squad goals, aspirations, needs, wants, yearnings, drives, cravings, principles, morals, ethics, senses of importance, and so on--are for the most part acquired or at least unfolded, rather than being explicitly present in a newborn. How does this happen? What generates these mental elements?

Hypothesis: a human derives many of zer values by imputing coherent agency to zer past behavior, and then adopting the goals of that fictitious agency as actively influential criteria for future action.

Thanks to Sam Eisenstat for relevant conversations.

The FIAT hypothesis

As a shorthand: "the FIAT hypothesis" = "the Fictitious Imputed Adopted Telos hypothesis". ("Fiat" is Latin for "may it happen" or "may it be made", which has some resonance with the FIAT hypothesis in that they both talk about a free creation of goals.) FIAT goals are goals imputed to some behavior and then adopted as goals.

Human behavior is determined by many things: built-in behavior-determiners such as the instinctive ability to breath, socially learned behavior and values, convergent instrumental goals, and freely created autopoietic goals such as artistic goals. The FIAT hypothesis says that a major determiner of a human's behavior is the process of adopting goals based on interpreting zer past behavior as agentic.

Ze can be interpreted as asking the question: if my past behavior were the behavior of a coherent agent trying to do something, what would that something be? Then, whatever the answer was, ze adopts it as a goal--a target of more coherent behavior (more effective, more strategic, more orchestrated, more coordinated, more conscious, better resourced, more reflective, more univocal, more wasteless).

This hypothesis gives a possible answer to the question: how did evolution build something with some substantial level of agentic coherence, even though evolution can't directly program conscious concepts like "avoiding death" or "saving food" or "inclusive genetic fitness" for use as terms in a utility function for an organism to pursue?

This process could be continuous, with goals becoming gradually more coherent (and then potentially deprioritized, but usually not de-cohered). This process is iterative, starting with built-in behavior-determiners, then adopting new FIAT goals based on past behavior mainly generated by built-in determiners (and also maybe adopting new goals for other reasons), and then adopting new goals based on past behavior influenced by previously adopted goals, including previous FIAT goals, and so on. FIAT goals also come from not just imputing goals to zer own behavior, but also to the behavior of others, such as parents and leaders. Everything gets enshrined, but everything is open to criticism.

Note that calling this a hypothesis is maybe presumptuous; it's an idea, but since it's abstract and it's about a complex system, there's a lot of ambiguity between FIAT and other explanations or descriptions of behavior, and it's not necessarily obvious how to make different predictions according to the FIAT hypothesis.

Something left quite unspecified is how the FIAT process picks different possible interpretations of past behavior as serving some goal. As S.E. said, "interpretation needs a criterion".

Built-in behavior-determiners

Organisms are born with features that partially generate behavior. (Or, that partially determine behavior, or partially direct behavior, viewing behavior as a free and open creation of the organism's mind.) More specifically, they're born with features that partially determine the direction of the effect on the world of their behavior, aside from the magnitude of that effect.

These behavior-determiners can to some extent be viewed as "hard-coded values", in the sense that they determine something about the direction of the effect on the world of a human's behavior. The FIAT hypothesis says that a human notices these directions, and then pursues them further than the built-in behavior determiner pursues them.

Some overlapping examples:

Pain, itch, and extreme hot or cold are aversive, causing turning away.
Lack of nutrients and other cues (such as stomach fullness) cause hunger, an arousal or orientation towards finding and consuming food that's dispelled when the cues are attenuated.
There are emotions, some of which are suites of actions that can be activated all together. E.g. a newborn crying is controlling multiple muscles in different areas of its body--the diaphragm relaxes to release elastic energy to push air out, laryngeal muscles contract to push the vocal folds together, hyoid muscles open the jaw, and other facial muscles furrow the brow--presumably for a single reason. Fear contributes to flight behavior, anger contributes to aggressive behavior, shame contributes to submission behavior.
There are many human reflexes, some of which are present at birth, e.g. the Galant reflex.
There are built-in interests, i.e. built-in determinations of how attention and exploration is allocated. E.g. newborn humans look at faces more than scrambled faces or non-faces. E.g. puberty usually causes humans to become interested in sex and pair-bonding. E.g. human infants develop in their first year a perceptual interest in spiders and snakes.
There are built-in pattern generators that coordinate rhythmic behavior like walking, swimming, and breathing. Walking in humans is driven by pattern generators, but this isn't so obvious from watching an infant; moving from the human extreme of altriciality, to precociality on the other hand, walking is very obviously in-built; e.g. newborn deer and newborn elephants. They're dialing in a few parameters of a pattern generator.
There are lots of human (near-)universals. These are not clear-cut instances of the direction of the effects of behavior being built-in, as many of them may be e.g. convergent instrumental goals of human mental activity (such as the use of antonyms), or may be e.g. just things that tend to happen similarly because humans are similar such as using the word "mama" to mean mother. Play, curiosity, and exploration seem like instances. They are interpretable as instrumental goals, since they help prepare non-specifically for future tasks. But it seems to me intuitively that they're adaptations being executed, not emergent as subgoals of mentally represented goals, though as a human grows these activities might only stay common by taking on somewhat of a mental subgoal nature.

Some data

The FIAT hypothesis is ambiguous with other explanations of behavior; see below. So the following possible FIAT goals are not clear examples, and could be taken as questions: are these FIAT goals? Why do humans behave like this? What are the goals involved (the aims of the behavior), if any, and by what force or reason or process are those goals created and adopted?

Self-perception theory. Daryl Bem's theory seems similar to the FIAT hypothesis, and the Wiki page describes some related research, including some counterevidence.
Fear of snakes and spiders. According to "Fear in Infancy", LoBue and Adolph, infants take special interest in snakes and spiders, but aren't clearly afraid of them. Some children later develop a fear of snakes and spiders, and many adults (maybe 40%) say they're afraid of snakes. A FIAT story about adult fear of snakes and spiders is that a built-in behavior-determiner causes children to attend especially to snakes and spiders; then this attentive behavior is later interpreted as "these creatures are especially important to me", and a natural guess as to why is that they are dangerous; so the value "snakes and spiders are dangerous" is adopted. This is very far from clear cut; why, for example, wouldn't the special interest be interpreted as "these are good to eat"? Why wouldn't human faces, which infants are also especially interested in, also be eventually interpreted as threats (maybe they are?)? What would the FIAT hypothesis not "explain"?
Foot-in-the-door. People are more likely to do more costly actions towards some purpose after they've already done less costly actions that are visibly towards that same purpose. E.g. participants in a study are much more likely to display a big ugly sign on their front yard if they'd previously been asked to display a small sign with a similar message. E.g. terrorist organizations and cults recruit people by getting them to do small things, small transgressions, and then escalating. E.g. a salesman might first get someone to sign up for a free newsletter, and only then try to make the sale. See Wiki for discussion and examples. The FIAT story is that someone who's taken one step implicitly notes something like "I've taken a step for this aim; so this aim is important to me.", making an ought from an is. The data isn't simple though; sometimes the opposite effect happens, where past actions towards a purpose provide a "moral license" to take other actions contrary to that purpose: "Consistency Versus Licensing Effects of Past Moral Behavior", Mullen, Monin.
Rituals. It may be that many rituals are important to a group because they demonstrate to members what is valued by the group. That is, members interpret the group behavior as being aimed at something, and then individually adopt those aims; e.g. reading Torah in a public ritual is interpreted as "study is valued" or "Torah is valued" or "history is valued", where "is valued" goes from a description to an individually held value by FIAT.
Fear of death. How do humans learn to want to not die? Death is like an extreme version of a painful event, a disabling injury, an illness, being lost, being trapped, being dispossessed, being ignored, being weak, being alone, being unable to help others, being infertile, being unattractive, rotting. A FIAT story is that avoiding each of these things points to avoiding their extreme versions in death, and the pattern of avoiding all those things points to death itself.
Sunk costs. (Wiki) If someone has sunk resources into some prospect, ze's more likely to further overinvest into that prospect. A FIAT interpretation says that ze's looking at zer investment as an indication that the prospect is worthwhile, and so ze's giving the prospect more weight as a goal.
The meaning of life. What's it for, why are we here, the world has a purpose, what does it all mean... These questions could be the FIAT mechanism spinning its wheels by not considering a specific enough subset of past behavior.
God. This could be FIAT applied to a group of people (such as ancestors, leaders, and society in general), extrapolated to a limit of completion. (Hence the progression from polytheism to monotheism--increasing coherence--and the odd combination of traits like omniscience and omnipotence with interest in earthly affairs. Though God is a big concept and probably quite a lot doesn't fit this picture.)
Endowments. E.g. "God (or evolution or whatever) gave me a voice, so I'm supposed to speak out; I have hands so I'm supposed to make things.".
Nihilism, existentialism, absurdism. Nihilism could be interpreted as an acknowledged failure of the FIAT process in general. Or, it could be specifically a self-referential failure of the FIAT process trying to make sense of the FIAT process as an agential attitude, i.e. trying to find meaning in meaning. Coherentifying itself demands coherentifying; how do you interpret the decision to coherentify-in-general as the action of a more coherent agent chosen among other possible actions according to some specific goal? Satre's strain of existentialism, in the idea that we create or choose our own values, could be taken as saying that recursive justification (or coherentification) hits bottom at the coherentifying process itself. Absurdism, as Wiki describes Camus's version, seems like nihilism but saying that it's okay. These ideas are presaged by Nietzsche asking: "Must we ourselves not become gods simply to appear worthy of [having murdered god]?"
Relationships. By taking actions to help someone, and then interpreting those actions, a person might adopt further care for who ze has helped.
Curiosity, play, and exploration become natural philosophy.
Glass sculpture. From Maps of Meaning, Peterson, p. 2:

Imagine that a baby girl, toddling around in the course of her initial tentative investigations, reaches up onto a countertop to touch a fragile and expensive glass sculpture. She observes its color, sees its shine, feels that it is smooth and cold and heavy to the touch. Suddenly her mother interferes, grasps her hand, tells her not to ever touch that object. The child has just learned a number of specifically consequential things about the sculpture—has identified its sensory properties, certainly. More importantly, however, she has determined that approached in the wrong manner, the sculpture is dangerous (at least in the presence of mother); has discovered as well that the sculpture is regarded more highly, in its present unaltered configuration, than the exploratory tendency—at least (once again) by mother.

Power and money. There's a lot of obvious reasons to want power and money. But they're mostly instrumental reasons; you want power and money in order to get something else. If there are people who sometimes really pursue power and money for their own sake--so that there's nothing specifically determined that they're going to do with the power or money, however much they get--one explanation for this would be that it's a FIAT goal born of interpreting the instrumental goal. (There are probably a lot of other explanations for this behavior; e.g. they may be traumatized into non-goal-pursuing behavior that locally seeks power, like a forest fire.)
Stories. It's said that people tell themselves stories about their behavior. There are many explanations for this. It can be construed as signaling behavior: the human has to get zer story straight with zemself, make it consistent, so that when ze's asked about it, ze has believable answers and isn't perceived as insane or punished for transgressions. Or, ze wants to be more legible so that ze can interoperate with others better. It's also a sort of compression for memory and planning: by summarizing what's been done, what's left to do is more clear. Another explanation is that it's the FIAT process at work: ze tries to make a story about Why ze did what ze did, and then goes on to act in accordance with that Why.
Mimetic desire. A person who sees another person acting as though some thing is valuable, might then also act as though that thing is valuable.

Redescriptions

Some ways to redescribe FIAT and related processes:

Coherentification. Humans try to become more coherent. Since there was incoherence, there's a free choice of what coherence to have--what utility function to approximate more closely the ideal pursuit of--and there's a constraint, which is to make past behavior coherent as a less capable pursuit of the chosen utility function.
IRL and auto-IRL. Humans do a sort of inverse reinforcement learning to themselves, constructing reward functions so that their past behavior is explained as being trained by that reward, and then training themselves by that constructed reward. They also do IRL to others, e.g. their parents and leaders.
Redescription. Humans redescribe past behavior in terms of new ideas, new understanding; this new language gives a fuller world to interpret past behavior as acting in.
Corrigibility to a fiction. A human-in-the-moment is corrigible to the human-in-general; the human usually mostly tries to avoid sabotaging themselves, and local goal-threads usually eventually cede their place in the pilot's seat to other goal-threads rather than trying to keep hold of the reins. That is, a human-in-the-moment continually returns to the background, the supergoal. But since the human isn't actually a fully coherent agent, there isn't exactly a supergoal; the corrigibility or deference is to a fictional agent, which is more like a collective of goal-threads and other elements than like an integrated efficient agent. This continual return to the supergoal, though, might somehow induce more supergoalness? I don't see it. (Maybe this is related to Heidegger's description of humans as going towards what withdraws, and thereby becoming a sign pointing toward what withdraws.)
Let me help you with that. Humans might be in general tuned to help out. (See these toddlers helping out, though this behavior is far from unambiguous, e.g. some of it could be better described as a kind of play, especially the toddler who puts the cubical block through the tubes.) That attitude might extend to smaller elements of the mind; elements constantly try to help each other out, which requires inferring or imputing a goal that's pursued by the helped element, whether or not the helped element is actually coherently pursuing anything.
Partially recovering evolution's goals. Hominid evolution is the unseen and unspeaking creator of humans, and it created the human mind kind of like a ship in a bottle. Since it couldn't program abstract values directly, it instead programmed built-in behavior determiners, and programmed a mechanism for (re)constituting in human minds the goals that, at the root within evolution's criterion, generated those behaviors.

Screenshot from video by Owen's DIY

Ambiguity

There's a lot of ambiguity between the FIAT hypothesis and other descriptions or explanations of behavior.

Some of the ambiguity is hypothetico-deductive ambiguity, i.e. testable, resolvable uncertainty between hypotheses that make different predictions. E.g. humans sometimes adopt the mere appearance of holding a value in order to signal to other humans. Mere signaling makes different predictions than FIAT when the signaling value of behavior is decreased. When people are watching, both signaling and FIAT strongly predict that a person will act as though ze has the socially desirable value, but when people aren't watching, FIAT strongly predicts the person will still behave as though ze holds the value, whereas signaling only weakly predicts that (though still isn't too surprised, because of uncertainty about being caught, and self-signaling as an aid to future signaling for some reason).

Some of the ambiguity is descriptional ambiguity, i.e. there's more than one useful and true way to describe a situation. E.g. does a soap bubble want to have low surface area, or is it evolving under local laws of gas pressure and surface tension? Both, kind of, though the "wanting" needs more qualification than the law-following. E.g., is the sunk cost heuristic due to a FIAT process adopting goals based on past investments, or due to a more narrow heuristic or bias towards relying on a cached plan to invest in something until it pays off or obviously completely fails? These aren't necessarily mutually exclusive: we might want to interpret the FIAT process as being not some sort of unified, separate brain module, but, like many evolved mental processes, as a class of mechanisms and behaviors evolved for the same reason, towards the same end. So there may be a narrow cache-reliance bias, and this could be viewed as evolution having found that narrow mechanism for the general reason that the mechanism tends to contribute to FIAT-like behavior, which is good in general because it avoids thrashing (such as investing and then abandoning the investment).

Some things that FIAT is ambiguous with (besides the above redescriptions, which might be themselves be separable hypotheses ambiguous with FIAT):

Processes that attempt to increase agency and coherence in general. For example, process that take existing goals and try to find strategies that better achieve those goals. This might look similar to a FIAT process: at first there's scattered, ineffective, inefficient, self-defeating behavior that partially achieves the goal; then latter the behavior more achieves the goal. One difference would be that if you say (literally, verbally, explicitly) "my goal is X", and then become better at getting X, the becoming better wasn't a FIAT process. Implicit goals vs. mere behavior that will later be interpreted as pursuing a goal adopted by FIAT, is a contrast that might be hypothetico-deductive ambiguity or might be descriptional ambiguity.
Relying on caches, avoiding thrashing. In general, there may be genetically programmed heuristics and biases that increase intertemporal pursuit-of-one-thing ("follow-through") relative to the previous default behavior (in the environment of evolutionary adaptedness), without aiming at some particular goal. If capability vs. coherence can be distinguished (e.g., lack of internal conflict is maybe a different type of coherence from the existence of a skill), then we could say that biases that increase intertemporal coherence are instances of evolution implementing FIAT in a piecemeal way (as opposed to creating a mechanism that adopts imputed goals in full generality): a FIAT goal based on past investment would in particular say to continue investing in that same thing, so simply following the rule of investing in previous investments might be effective because FIAT is effective. Another example is the joy of exerting a skill to no particular end, not even necessarily aimed at practicing.
Imitation. Humans copy the behavior of other humans. This in itself could constitute adopting goals; or it could look like copying goals, while really just being raw copying (e.g., not giving rise to further non-copied behavior towards the same ends). Also, humans might "rawly copy" behavior, and then adopt goals imputed to their own past raw-copy behavior, thereby regularizing the behavior, i.e. distilling the behavior into compact generators; e.g. the apprentice who "makes the craft zer own", or a child who reinvents zer parents's language first through imitation and then through creativity. Also, humans might adopt goals imputed to other humans, as in the glass sculpture example, which could look like raw imitation. Other-FIAT could give rise to inter-generational "goal regularization", wherein, like linguistic regularization, a child's parent-FIAT goals are more parsimoniously expressible than their parents's goals (think of the fanaticism of the child or the convert, and moral progress); and, like a creole language distilled and regularized out of its input languages, a child's society-FIAT goals are sort of creole goals. A double bind might be especially deranging because it makes FIAT logically impossible, and so it not only causes a concrete difficult, but also threatens to break or punish or shut down the FIAT machinery itself.
Integrating the inexplicit. An orientation towards adopting FIAT goals might be hard to distinguish from an orientation of respecting the inexplicit-but-already-there elements of a mind. Explicitizing the inexplicit, integrating it, is a kind of adopting the way of being of [an agency that includes all this explicit and inexplicit stuff]. For example, you might notice X, and then say to yourself "Hm, I'm noticing X, so maybe X is important to me, I'll pursue an investigation of X.". This is interpretable as a FIAT goal: it would make sense of your behavior of noticing X, if X were important to you, so then you treat X as important and worth investigation. It's also interpretable as respect for the inexplicit: "There was probably something in me that wanted something to do with X, so I'll cooperate with that something and try to understand what it wants with X.".
Respect for openness. If novel understanding implies novel agency, at least in the human regime, then openness to new ways of thinking is a necessary feature of fit organism. So goal-threads that try to make all the mental activity be in their service will not only contradict corrigibility to the whole as described above, but will interfere with creativity and openness to thinking as appropriate to new contexts. This incentivizes a stance of the whole mind against any given specific goal-thread: this isn't the whole of what matters and shouldn't be taken to an extreme, but rather has to fit in as one subgoal of many held by a bigger agent. FIAT also applies a pressure like that, always interpreting any particular behavior (including a cluster of behaviors driven by a goal) as one move among others of an agency with broader goals.
Negotiating new coalitions. Especially to the extent that novel understanding implies novel agency, humans might often take on new values or drives out of necessity. To be as stable as humans are, that would require some sort of ongoing process of renegotiating a coalition that acts coherently, maybe related in spirit to Critch's negotiable reinforcement learning but with an increasing set of agents. This renegotiation towards coherent agency would look somewhat like a FIAT process.
Imputing agency in general. There are lots of reasons to impute agency to behavior besides specifically to extract goals to adopt. E.g. it's useful for empathic modeling, and in general the intentional stance is useful.

Questions

Can the FIAT hypothesis be cached out into concrete, testable predictions?

By what criterion does or should humans select among the possible interpretations of past behavior as goal-pursuit?

In what sense can imputed goals be fictitious? What are the possible differences between a fictitious goal and a goal that's real but tacit, incompetently pursued, secret, etc.?

Is this related to Deutsch's theory of everything being open to criticism, even goals and values?

How does this relate to corrigibility?

Can or ought one impute specific goals to the FIAT process itself?

What are some clear non-examples of FIAT goals, besides built-in drives? E.g. is the subset of morality that could be derived from having to cope with the neighbors, e.g. "fairness", a value that's clearly not adopted by FIAT, but rather by symmetrization?

I think it's good to think of FIAT stuff as a special case of applying some usual understanding-machinery (like, abductive and inductive machinery) in value-laden cases. It's the special case where one implicitly or explicitly abducts to (one having) goals. Here is an example ethical story where the same thing shows up in various ways such that it'd imo be sorta contrived to analyze it in terms of goals being adopted:

You find it easy to feel a strong analogy between "you do X to me" and "I do X to you". (In part, this is because: as a human, you find it easy to put yourself in someone else's shoes.)
This turns into an implicit ethical inference rule — you can now easily move from believing "you should not do X to me" to believing "I should not do X to you". Machinery for this transformation of an analogy into an inference rule is present largely because it is good for understanding stuff, which is good for lots of stuff — importantly, it (or some more general thing which has it as a special case) is ultimately good for producing more offspring.
You then notice you have this inference rule, and you feel good about having it, and you turn it into an explicit principle: "do not treat others in ways that you would not like to be treated". E.g. you do this because you want to tell your kid something to get them to stop misbehaving in a particular way, and they don't seem to be fully getting your argument/explanation for why they behaved egregiously which used your implicit inference rule. This explicitizing move is obviously good for teaching in general, and good for individual understanding (it's often useful to scrutinize your inference rules, e.g. to limit or expand their context of applicability).
This explicit principle then "gains points" from making sense of lots of other stuff you already thought, e.g. "lying is bad" and "stealing is bad". Machinery for this sort of point-gaining is present because it's again good for understanding stuff in many cases — it's just a hypothesis gaining points by [making sense of]/predicting facts.
You then seek to make this explicit principle more precise and correct/"correct" (judged against some other criteria, e.g. by whether it gives correct verdicts (ie "makes correct predictions") about what one should do in various particular cases). Maybe you come up with the version: "act only in accordance with that maxim through which you can at the same time will that it become a universal law".
You seek good further justifications of it, and often adopt those as plausible hypotheses, often effectively taking the principle itself as some evidence for these hypotheses. You identify key questions relating to whether the principle is right. You clarify its meaning (that is, what it should mean) further. You study alternative formulations of it. ^[1] You spell out its consequences better. You seek out problematic cases. You construct a whole system around the principle. All this is a lot like something you would do to a scientific hypothesis.

(Acknowledgment. A guiding idea here is from a chat with Tom Everitt.)

(Acknowledgment'. A guiding frustration here is that imo people posting on LessWrong think way too much in terms of goals.)

e.g. "a rational being must always regard himself as lawgiving in a kingdom of ends possible through freedom of the will, whether as a member or as sovereign" ↩︎

(Acknowledgment'. A guiding frustration here is that imo people posting on LessWrong think way too much in terms of goals.)

FYI, I think this comment might be the best (compressed/short?) illustration of limitations of thinking in terms of goals for the purpose of understanding agency that I've seen.

A potential culture-level historical case of FIAT: AFAIK, Jewish monotheism emerged in response sometime in 5th century BCE in the aftermath (during?) the Babylonian captivity. Before that, Jews were henoteistic, with slight "preference" for JHWH. When their country was conquered "they reasoned" "we must have insulted the God with our cult of other gods (otherwise he wouldn't allow Babylonians to enslave us), so let's erase all explicit mentions of polytheism from the scriptures and ban the worship of non-JHWH gods".

Also, I wonder how FIAT relates to uniquely human capacity and tendency/drive to overimitate others. Is overimitation tied to inference of latent reasons for behavior and re-application of that mode of thinking to one's past self results in FIAT?

FIAT (by another name) was previously proposed in the book On Intelligence. The version there had a somewhat predictive-processing-like story where the cortex makes plans by prediction alone; so reflective agency (really meaning: agency arising from the cortex) is entirely dependent on building a self-model which predicts agency. Other parts of the brain are responsible for the reflexes which provide the initial data which the self-model gets built on (similar to your story).

The continuing kick toward higher degrees of agency comes from parts of the brain which have reactions to the predictions made by the cortex. (Otherwise, the cortex just learns to predict the raw reflexes, and we're stuck imitating our baby selves or something along those lines).

It's not clear precisely how all of that works, but basically it means we have a pure predictive system (and much of the time we simply take the predicted actions), plus we have some other stuff (EG reflexes, and an override RLish system which inhibits and/or replaces the predicted action under some circumstances).

The most obvious version of FIAT which someone might write down after reading your post, otoh, is more like: run some IRL technique on your own past actions, and then (most of the time) plan based on the inferred goals, again with some overrides (built-in reflexes).

Anyway.

Here's my attempt to make a probably-false prediction from FIAT, as best I can.

It seems like the thing to do is to look for cases where people pursue their own goals, rather than the goals they would predict they have based on past actions.

It needs to be complex enough to not plausibly be a reflex/instinct.

A sort of plausible example is courtship. It's complex, it can't easily be inferred from previous things you did (not the first time you do it, that is), and it agentically orients toward a goal. The problem is, I think it's well-explained as imitation - "I'm a person; the people around me do this and seem really into it; so I infer that I'm really into it too".

So it's got to be a case where someone does something unexpected, even to themselves, which they don't see people do, but which achieves goals-they-plausibly-had-in-hindsight.

Homosexual intercourse in the 1800s??

Christopher Thomas Knight heading off into the woods??

The continuing kick toward higher degrees of agency comes from parts of the brain which have reactions to the predictions made by the cortex. (Otherwise, the cortex just learns to predict the raw reflexes, and we're stuck imitating our baby selves or something along those lines).

Interesting. This seems to imply a (weak) prediction that defects of (some) "parts of the brain which have reactions to the predictions made by the cortex" might manifest as mental developmental disorders.

I don't recall seeing that theory in the first quarter of the book, but I'll look for it later. I somewhat agree with your description of the difference between the theories (at least, as I imagine a predictive processing flavored version). Except, the theories are more similar than you say, in that FIAT would also allow very partial coherentifying, so that it doesn't have to be "follow these goals, but allow these overrides", but can rather be, "make these corrections towards coherence; fill in the free parameters with FIAT goals; leave all the other incoherent behavior the way it is". A difference between the theories (though I don't feel I can pass the PP ITT) is that FIAT allows, you know, agency, as in, non-myopic goal pursuit based on coherent-world-model-building, whereas PP maybe strongly hints against that?

It seems like the thing to do is to look for cases where people pursue their own goals, rather than the goals they would predict they have based on past actions.

I'm confused by this; are these supposed to be mutually exclusive? What's "their own goals"? [After thinking more: Oh like you're saying, here's what it would look like to have a goal that can't be explained as a FIAT goal? I'll assume that in the rest of this comment.]

It needs to be complex enough to not plausibly be a reflex/instinct.

Agreed.

A sort of plausible example is courtship. It's complex, it can't easily be inferred from previous things you did (not the first time you do it, that is), and it agentically orients toward a goal.

I'm not sure I buy that it can't be inferred, even the first time. Maybe you have fairly built-in instincts that aren't about the whole courtship thing, but cause you to feel good when you're around someone. So you seek being around them, and pay attention to them. You try to get them interested in being around you. This builds up the picture of a goal of being together for a long time. (This is a pretty poor explanation as stated; if this explanation works, why wouldn't you just randomly fall in love with anyone you do a favor for? But this is why it's at least plausible to me that the behavior could come from a FIAT-like thing. And maybe that's actually the case with homosexual intercourse in the 1800s.)

The problem is, I think it's well-explained as imitation - "I'm a person; the people around me do this and seem really into it; so I infer that I'm really into it too".

Maybe courtship is especially much like this, but in general things sort-of-well-explainable as imitation seem like admissible falsifications of FIAT, e.g. if there are also pressures against the behavior.

FIAT is (somewhat) reminiscent of a humanities concept called interpellation.

If there are people who sometimes really pursue power and money for their own sake--so that there's nothing specifically determined that they're going to do with the power or money, however much they get--one explanation for this would be that it's a FIAT goal born of interpreting the instrumental goal. (There are probably a lot of other explanations for this behavior; e.g. they may be traumatized into non-goal-pursuing behavior that locally seeks power, like a forest fire.)

Is the parenthetical here misplaced? It seems unrelated to the text that precedes it.

It's giving an alternative explanation of the observation.

Ah, I initially interpreted "forest fire" literally, as the event that traumatizes someone into non-goal-pursuing behavior. I see now that it's supposed to be parsed as a figurative description of how the behavior itself spreads.

Oh right sorry. Yeah, exactly.

You find it easy to feel a strong analogy between "you do X to me" and "I do X to you". (In part, this is because: as a human, you find it easy to put yourself in someone else's shoes.)
This turns into an implicit ethical inference rule — you can now easily move from believing "you should not do X to me" to believing "I should not do X to you". Machinery for this transformation of an analogy into an inference rule is present largely because it is good for understanding stuff, which is good for lots of stuff — importantly, it (or some more general thing which has it as a special case) is ultimately good for producing more offspring.
You then notice you have this inference rule, and you feel good about having it, and you turn it into an explicit principle: "do not treat others in ways that you would not like to be treated". E.g. you do this because you want to tell your kid something to get them to stop misbehaving in a particular way, and they don't seem to be fully getting your argument/explanation for why they behaved egregiously which used your implicit inference rule. This explicitizing move is obviously good for teaching in general, and good for individual understanding (it's often useful to scrutinize your inference rules, e.g. to limit or expand their context of applicability).
This explicit principle then "gains points" from making sense of lots of other stuff you already thought, e.g. "lying is bad" and "stealing is bad". Machinery for this sort of point-gaining is present because it's again good for understanding stuff in many cases — it's just a hypothesis gaining points by [making sense of]/predicting facts.
You then seek to make this explicit principle more precise and correct/"correct" (judged against some other criteria, e.g. by whether it gives correct verdicts (ie "makes correct predictions") about what one should do in various particular cases). Maybe you come up with the version: "act only in accordance with that maxim through which you can at the same time will that it become a universal law".
You seek good further justifications of it, and often adopt those as plausible hypotheses, often effectively taking the principle itself as some evidence for these hypotheses. You identify key questions relating to whether the principle is right. You clarify its meaning (that is, what it should mean) further. You study alternative formulations of it. ^[1] You spell out its consequences better. You seek out problematic cases. You construct a whole system around the principle. All this is a lot like something you would do to a scientific hypothesis.

(Acknowledgment. A guiding idea here is from a chat with Tom Everitt.)

(Acknowledgment'. A guiding frustration here is that imo people posting on LessWrong think way too much in terms of goals.)

e.g. "a rational being must always regard himself as lawgiving in a kingdom of ends possible through freedom of the will, whether as a member or as sovereign" ↩︎

(Acknowledgment'. A guiding frustration here is that imo people posting on LessWrong think way too much in terms of goals.)

FYI, I think this comment might be the best (compressed/short?) illustration of limitations of thinking in terms of goals for the purpose of understanding agency that I've seen.

Anyway.

Here's my attempt to make a probably-false prediction from FIAT, as best I can.

It seems like the thing to do is to look for cases where people pursue their own goals, rather than the goals they would predict they have based on past actions.

It needs to be complex enough to not plausibly be a reflex/instinct.

So it's got to be a case where someone does something unexpected, even to themselves, which they don't see people do, but which achieves goals-they-plausibly-had-in-hindsight.

Homosexual intercourse in the 1800s??

Christopher Thomas Knight heading off into the woods??

The continuing kick toward higher degrees of agency comes from parts of the brain which have reactions to the predictions made by the cortex. (Otherwise, the cortex just learns to predict the raw reflexes, and we're stuck imitating our baby selves or something along those lines).

It seems like the thing to do is to look for cases where people pursue their own goals, rather than the goals they would predict they have based on past actions.

It needs to be complex enough to not plausibly be a reflex/instinct.

Agreed.

A sort of plausible example is courtship. It's complex, it can't easily be inferred from previous things you did (not the first time you do it, that is), and it agentically orients toward a goal.

The problem is, I think it's well-explained as imitation - "I'm a person; the people around me do this and seem really into it; so I infer that I'm really into it too".

FIAT is (somewhat) reminiscent of a humanities concept called interpellation.

If there are people who sometimes really pursue power and money for their own sake--so that there's nothing specifically determined that they're going to do with the power or money, however much they get--one explanation for this would be that it's a FIAT goal born of interpreting the instrumental goal. (There are probably a lot of other explanations for this behavior; e.g. they may be traumatized into non-goal-pursuing behavior that locally seeks power, like a forest fire.)

Is the parenthetical here misplaced? It seems unrelated to the text that precedes it.

It's giving an alternative explanation of the observation.

Oh right sorry. Yeah, exactly.

52

Do humans derive values from fictitious imputed coherence?

52

Ω 22

The FIAT hypothesis

Built-in behavior-determiners

Some data

Redescriptions

Ambiguity

Questions

52

Ω 22

52

Ω 22