The stronger version is: EUT is inadequate as a theory of agents (for the same reasons, and in the same ways) during an agent's "growing up" period as well as all the time. I think the latter is the case for several reasons, for example:
- agents get exposed to novel "ontological entities" continuously (that e.g. they haven't yet formed evaluative stances with respect to), and not just while "growing up"
- there is a (generative) logic that governs how an agent "grows up" (develops into a "proper agent"), and that same logic continues to apply throughout an agent's lifespan
I think this is a very important point; my post on value systematization is a (very early) attempt to gesture towards what an agent "growing up" might look like.
Yeah neat, I haven't yet gotten to reading it but is definitely on my list. Seems (and some folks suggested to me) that it's quite related to the sort of thing I'm discussing in value change problem too.
There are some similarities, although I'm focusing on AI values not human values. Also, seems like the value change stuff is thinking about humanity on the level of an overall society, whereas I'm thinking about value systematization mostly on the level of an individual AI agent. (Of course, widespread deployment of an agent could have a significant effect on its values, if it continues to be updated. But I'm mainly focusing on the internal factors.)
There is another sense in which I would not want to say that there is any particular hierarchy between natural/unnatural/rational constraints.
I think there's a lot to unpack here. I'm going to give it a preliminary go, anticipating that it's likely to be to a bit all over the place. The main thread I want to pull is what it means to impose a particular hierarchy between the constraints, and then see how this leads to many possible hierarchies in such a way that it feels like no particular hierarchy is privileged.
From a "natural" point of view, which privileges physical time, individuation is something that must be explained - a puzzle which is at the heart of the mystery of the origin of life. From this point of view, "rationality" or "coherence" is also something that must be explained (which is what Richard Ngo is gesturing out in his comment / post).
From a "rational" point of view, we can posit abstract criteria which we want our model of agency to fulfil. For instance Logical Induction (Garrabrant et al. 2016), takes a formalisation of the following desideratum (named "Approximate Inexploitability", or "The Logical Induction Criterion"): "It should not be possible to run a Dutch book against a good reasoner in practice." (ibid, p.8, p.14), and then constructs an agent entirely within logic from this. Something like "rationality" or "coherence" is assumed (for well argued reasons), and the structure of agency is deduced from there. This kind of move is also what underpins selection theorems. In my view, individuation also needs to be explained here, but it's often simply assumed (much like it is in most of theoretical biology).
The "unnatural" point of view is much more mysterious to us. When I use the term, I want to suggest that individuation can be assumed, but physical time becomes something that must be explained. This is a puzzle which is touched on in esoteric areas of physics (e.g. "A smooth exit from eternal inflation?" - Hawking and Hertog 2018), and consciousness science (e.g. "A measure for intrinsic information" - Barbosa et al. 2020), and discussed in "religious" or "spiritual" contexts, but in reality very poorly understood. I think you gesture at a really interesting perspective on this by relating it to "thinghood" in active inference - but to me this misses a lot of what makes this truly weird - the reasons I decided to label it "unnatural" in the first place.
It's deeply confusing to me at this stage how the "unnatural" point of view relates to the "rational" one, I'd be curious to hear any thoughts on this, however speculative. I do, however, think that there is a sense in which none of the three hierarchies I'm gesturing at in this comment are "the real thing" - they more feel like prisms through which we can diffract the truth in an attempt to break it down into manageable components.
bounded, embedded, enactive, nested.
I know about boundedness, embededness, and I guess nestedness is about hierarchical agents.
But what's enactive?
Roughly... refers to/emphasizes the dynamic interaction between agent and environment and understands behavior/cognition/agency/... to emerge through that interaction/at that interface (rather than, e.g, trying to understand them as an internal property of the agent only)
Meta:
Expected utility theory (stated in terms of the VNM axioms or something equivalent) thinks of rational agents as composed of two "parts", i.e., beliefs and preferences. Beliefs are expressed in terms of probabilities that are being updated in the process of learning (e.g., Bayesian updating). Preferences can be expressed as an ordering over alternative states of the world or outcomes or something similar. If we assume an agent's set of preferences to satisfy the four VNM axioms (or some equivalent desiderata), then those preferences can be expressed with some real-valued utility function u and the agent will behave as if they were maximizing that u.
On this account, beliefs change in response to evidence, whereas values/preferences in most cases don't. Rational behavior comes down to (behaving as if one is) ~maximizing one's preference satisfaction/expected utility. Most changes to one's preferences are detrimental to their satisfaction, so rational agents should want to keep their preferences unchanged (i.e., utility function preservation is an instrumentally convergent goal).
Thus, for a preference modification to be rational, it would have to result in higher expected utility than leaving the preferences unchanged. My impression is that the most often discussed setup where this is the case involves interactions between two or more agents. For example, if you and and some other agent have somewhat conflicting preferences, you may go on a compromise where each one of you makes them preferences somewhat more similar to the preferences of the other. This costs both of you a bit of (expected subjective) utility, but less than you would lose (in expectation) if you engaged in destructive conflict.
Another scenario justifying modification of one's preferences is when you realize the world is different than you expected on your priors, such that you need to abandon the old ontology and/or readjust it. If your preferences were defined in terms of (or strongly entangled with) concepts from the previous ontology, then you will also need to refactor your preferences.
You think that this is a confused way to think about rationality. For example, you see self-induced/voluntary value change as something that in some cases is legitimate/rational.
I'd like to elicit some of your thoughts about value change in humans. What makes a specific case of value change (il)legitimate? How is that tied to the concepts of rationality, agency, etc? Once we're done with that, we can talk more generally about arguments for why the values of an agent/system should not be fixed.
Sounds good?
On a meta note: I've been using the words "preference" and "value" more or less interchangeably, without giving much thought to it. Do you view them as interchangeable or would you rather first make some conceptual/terminological clarification?
Sounds great!
(And I'm happy to use "preferences" and "values" interchangeably for now; we might at some point run into problems with this, but we can figure that out when we get there.)
Where to start...?
First, do I think the first part of your intro is "a confused way to think about rationality"? Sort of, but it's a bit tricky to get our language to allow us to make precise statements here. I'm perfectly happy to say that under certain notions of rationality, your description is right/makes sense. But I definitely don't think it's a particularly useful/relevant one for the purposes I'm interested in. There is a few different aspects to this:
Ok... maybe I leave it there for now? I haven't really gotten to your two leading questions yet (though maybe started gesturing at some pieces of the bigger picture that I think are relevant), so happy for you to just check whether you want to clarify or follow up on something I've said so far and otherwise ask me to address those two questions directly.
While we're at it, I have some thoughts and would be curious to hear your counterthoughts.
So your points are (1) the idealizing assumptions of EUT don't apply to real-world agents and (2) EUT gives only a static/snapshot picture of an agent. Both seem to have parallels in the context of Bayesian epistemology (probably formalized epistemology more broadly but I'm most familiar with the Bayesian kind).
I'll focus on (1) for now. Bayesian epistemology thinks of rational reasoners/agents as logically omniscient, with perfectly coherent probabilistic beliefs (e.g., no contradictions, probabilities of disjoint events sum up to 1), updating on observations consistently with the ratio formula P(B|A)=[P(A|B)⋅P(B)]/P(A) and so on. This obviously raises the question about to what extent this formalism is applicable/helpful for guiding real-world logic of forming and updating beliefs. Standard responses seem to fall along the lines of (SEP):
(a) Even though unattainable, idealized Bayesian epistemology is a useful ideal to aspire towards. Keeping our sight on the ideal reminds us that "the math exists, even though we can't do the math precisely". This can guide us in our imperfect attempts to refine our reasoning so that it approximates that ideal as much as possible (or rather, as much as profitable on the margin because there obviously are diminishing returns to investing in better cognition).
(b) Idealized Bayesian epistemology is akin to a spherical cow in the vacuum or an ideal gas. It's a formalism meant to capture the commonalities of many real-world phenomena with a varying degree of inaccuracy. The reason for its partial success is probably that they share some common abstract property that arises in each case via a sufficiently similar process. This ideal can then be de-idealized by adding some additional constraints, details, and specifications, that make it closer to how some specific real-world system (or a class of systems) functions.
(Perhaps related to the distinction between counting-down coherence and counting-up coherence?)
It seems to me that analogous responses could be given to allegations of EUT being a theory of idealized agents that are unbounded, unembedded and so on. Maybe EUT is an unattainable ideal but is nevertheless useful as an ideal to aspire towards? And/or maybe it can be used as a platonic template to be filled out with real-world contingencies of cognitive boundedness, value/preference-generating processes and so on? What do you think of that?
You mentioned that "under certain conditions/notions of rationality [EUT prescriptions] make sense". Does that mean you view EUT as a special (perhaps very narrow and unrealistic in practice) case of some broader theory of rationality of which we currently have an incomplete grasp?
Regarding (2), the problem of lack of specification of how values arise in a system seems similar to the problem of priors, i.e., how should an agent assign their initial credences on propositions on which they lack evidence (SEP). Maybe the very way this question/problem is formulated seems to presume an idealized (form of an) agent that gets embedded in the world, rather than something that arises from the world via some continuous process, adapting, gaining autonomy, knowledge, competence, intelligence, etc.
Let me rephrase your pet peeve to check my understanding of it.
When observing an agent doing certain things, we're trying to infer their preferences/values/utility function from their behaviour (plus maybe our knowledge of their cognitive boundedness and so on). These are just useful abstractions to conceptualize and predict their behaviour and are not meant to correspond to any reality-at-joints-carving thing in their brain/mind. In particular, it abstracts away from implementional details. But then preferences/values/utility function are used as if they correspond to such a thing and the agent is assumed to be oriented towards maximizing their utility function or satisfaction/fulfillment of their values/preferences?
Maybe the very way this question/problem is formulated seems to presume an idealized (form of an) agent that gets embedded in the world, rather than something that arises from the world via some continuous process, adapting, gaining autonomy, knowledge, competence, intelligence, etc.
Yes, this is definitely one of the puzzle pieces that I care a lot about. But I also want to emphasize that there is a weaker interpretation of this critique and a stronger one, and really I am most interested in the stronger one.
The weak version is roughly: there is this "growing up" period during which EUT does not apply, but once the agent has grown up to be a "proper" agent, EUT is an adequate theory.
The stronger version is: EUT is inadequate as a theory of agents (for the same reasons, and in the same ways) during an agent's "growing up" period as well as all the time.
I think the latter is the case for several reasons, for example:
......
Now, the tricky bit -- and maybe the real interesting/meaty bit to figure out how to combine -- is that, at some point in evolutionary history, our agent has accessed (what I like to call) the space of reason. In other words, our agent "goes computation". And now I think an odd thing happens: while earlier our agent was shaped and constraint by "material causes" (and natural selection), now our agent is additionally also shaped and constraint by "rational cause/causes of reason".* These latter types of constraints are the ones formal epistemology (including EUT etc.) is very familiar with, e.g. constraints from rational coherence etc. And I think it is correct (and interesting and curious) that these constraints come to have significant affect on our agent, in a sort of retro-causal way. It's the (again sort of) downward causal 'force' of abstraction.
(* I "secretly" think there is a third type of constraint we need to understand in order to understand agent foundations properly, but this one is weirder and I haven't quite figured out how to talk about it best, so I will skip this for now.)
Seems like we've converged on exactly the thing that interests me the most. Let's focus on this strong EUT-insufficiency thesis
agents get exposed to novel "ontological entities" continuously (that e.g. they haven't yet formed evaluative stances with respect to), and not just while "growing up"
This seems to imply that (at least within our universe) the agent can never become "ontologically mature", i.e., regardless of how much and for how long it has "grown", it will continue experiencing something like ontological crises or perhaps their smaller "siblings", like belief updates that are bound to influence the agent's desires by acting on its "normative part", rather than merely on the "epistemic part".
I suspect the latter case is related to your second point
there is a (generative) logic that governs how an agent "grows up" (develops into a "proper agents"), and that same logic continues to apply throughout an agent's lifespan
Do you have some more fleshed-out (even if very rough/provisional) thoughts on what constitutes this logic and the space of reason? Reminds me of the "cosmopolitan leviathan" model of the mind Tsvi considers in this essay and I wonder whether your proto-model has a roughly similar structure.
Ok, neat! So.. first a few clarifying notes (or maybe nitpicks):
1)
regardless of how much and for how long it has "grown", it will continue experiencing something like ontological crises or perhaps their smaller "siblings"
So I think this is true in principle, but seems worth flagging that this will not always be true in practice. In other words, we can imagine concrete agents which have reached at some point an ontology that they will no further change until their death. This is not because they have reached the "right" or "complete" ontology with respect to the world, but simply a sufficient one with respect to what they have or will encounter.
A few things that follow from this I want to highlight:
2)
perhaps their smaller "siblings"
I like this investigation! I am not sure/haven't thought much about what the smaller sibling might be (or whether we really need it), but I seem to have a similar experience to you in that saying "ontological crises" seems sometimes right in type but bigger than what I suspect is going on.
[Insert from Mateusz: I later realized that the thing we're talking about is concept extrapolation/model splintering.]
3)
like belief updates that are bound to influence the agent's desires by acting on its "normative part", rather than merely on the "epistemic part".
My guess (including form other conversations we had) is that here is a place where our background models slightly disagree (but I might be wrong/am not actually entirely confident in the details of what your model here is). What I'm hearing when I read this is still some type difference/dualism between belief and value updates -- and I think my models suggest a more radical version of the idea that "values and beliefs are the same type". As such, I think every belief update is a value update, though it can be small enough to not "show" in the agent's practical reasoning/behavior (similar to how belief updates may not immediately/always translate into choosing different actions).
Ok, now to the generative logic bit!
Ah gosh, mostly I don't know. (And I haven't read Tsvi's piece yet, but appreciate the pointer and will try to look at it soon, and maybe comment about its relationship to my current guesses later) But here are some pieces that I'm musing over. I think my main lens/methodology here is to be looking for what constraints act on agents/the generative logic of agents:
1. "Thinghood" / constraints from thinghood
(for a reason that might only come clear later on, I am playing with calling this "constraints from unnatural selection".)
2) Natural selection / constraints from natural selection
3) Rationality/Reason / constraints from rationality/reason
Ok, this was a whole bunch of something. Let me finish with just a few more loose thoughts/spitballing related to the above:
[Epistemic status: very speculative/loosely held. Roughly half of it is poetry (for now).]
My guess (including form other conversations we had) is that here is a place where our background models slightly disagree
I also think that they are two aspects of the same kind of thing. It's just me slipping back into old ways of thinking about this.
EDIT: I think though that there is something meaningful I was trying to say and stating it in a less confused/dualistic way would be something like one of these two:
(1) The agent acquires new understanding which makes them "rethink"/reflect on their values in virtue of these values themselves (FWIW) rather than their new state of belief implying that certain desirable-seeming things are out of reach, actions that seemed promising, now seem hopelessly "low utility" or something.
or
(2) Even if we acknowledge that beliefs are values are fundamentally (two aspects of) the same kind, I bet there is still a meaningful way to talk about beliefs and values on some level of coarse-graining or for certain purposes. Then, I'm thinking about something like:
An update that changes both the belief-aspect of the belief-value-thing and its value-aspect, but the value-aspect-update is of greater magnitude (in some measure) from the belief-aspect-update in a way that is not downstream from the belief-aspect-update, but rather both are locally independently downstream from the same new observation (or whatever triggered the update).
(Noticed there is a fairly different angle/level at which the questions about the generative logic could be addressed too. At that level, we'd for example want to more concretely talk about the "epistemic dimension" of values & the "normative or axiological" dimensions of beliefs. Flagging in case you are interested to go down that road instead. For example, we could start by listing some things we have noticed/observed about the epistemic dimension of values and vice versa, and then after looking at a number of examples zoom out and check whether there are more general things to be said about this.)
In case you missed, Tsvi has a post (AFAICT) exactly about thinghood/thingness.
Can what you wrote be summarized that "being a free energy-minimizing system" and "thinghood" should be definitionally equivalent?
"In virtue of being a thing" can be used as a basis for inference. In other words, "thinghood" is the "first" place where my hypothesis space for interpreting my sensory data starts to be constrainedconstraint in some ways.
Does it mean that in order to infer anything from some input, that input must be parseable (/thinkable-of) in terms of things? (maybe not necessarily things it represents/refers-to[whatever that means]/is-caused-by but spark some associations with a thing in the observer)
Or do you mean that one needs to "be a thing" in order to do any kind of inference?
Is it fair to summarize this as "thinghood"/"unnatural selection" is a necessary prerequisite for natural selection/Darwinian evolution? This reminds me of PGS's insistence on discrete individuals with clear-ish parent-offspring relationships (operationalized in terms of inherited "fitness"-relevant variance or something) to be a sine qua non of natural selection (and what distinguishes biological evolution from e.g., cultural "evolution"). It felt intuitive to me but I don't think he gave specific reasons for why that must be the case.
I think you could say that natural selection has been a prerequisite for agents capable of being constrained by the space of reason. This has been true of humans (to some extent other animals). Not sure about autonomous/agenty AIs (once[/if] they arise), since if they develop in a way that is a straightforward extrapolation of the current trends, then (at least from PGS's/DPNS perspective) they would qualify as at best marginal cases of Darwinian evolution (for the same reasons he doesn't see most memes as paradigmatic evolutionary entities and at some point they will likely become capable of steering their own trajectory not-quite-evolution).
Noticed there is an fairly different angle/level at which the questions about the generative logic could be addressed too
I think the current thread is interesting enough
(quick remark on your edit re the (non-)dualistic way of talking about values/beliefs -- here is a guess for where some of the difficulty to talk about comes from:
We typically think/talk about values and beliefs as if they were objects, and then we think/talk about what properties these object have.
How I think we should instead think about this: there is some structure to an agent*, and that structure unravels into "actions" when it comes into contact with the environment. As such "beliefs" and "values" are actually just "expressions" of the relation between the agent's "structure/morphology" and the environment's "structure/morphology".
Based on this "relational" picture, we can then refer to the "directionality of fit" picture to understand what it means for this "relation" to be more or less belief/value like -- namely depending on what the expressed direction of fit is between agent and world.
(*I think we'd typically say that the relevant structure is located in the agent's "mind" -- I think this is right insofar as we use a broad notion of mind, acknowledging the role of the "body"/the agent's physical makeup/manifestation.)
---
Does it mean that in order to infer anything from some input, that input must be parseable (/thinkable-of) in terms of things? (maybe not necessarily things it represents/refers-to[whatever that means]/is-caused-by but spark some associations with a thing in the observer)
Or do you mean that one needs to "be a thing" in order to do any kind of inference?
More like the latter. But more precisely: assume you just have some sensory input (pre any filter/ontology you have specific reason to trust that would help you organize/make sense of that sensory input). There is a question how you could, from this place, make any valid inference. What I'm trying to say with the "thinghood" constraint is that, the fact that you're experiencing any sensory input at all implies you must have some "sustained existence" - you must endure for more than just a single moment. In other words, you must be a "thing" (according to the minimal definition form above/from FEP). But that fact allows you to back out something - it becomes your "initial ground to stand on" from which you can "bootstrap" up. It's a bit like Descarte's "I think therefor I am" -- but more like "I am [a thing], therefor... a certain relationship must hold between different bits of sensory input I am receiving (in terms of their spatial and temporal relationship) -- and this new forms the ground from which I am able to do my first bits of inference.
---
Is it fair to summarize this as "thinghood"/"unnatural selection" is a necessary prerequisite for natural selection/Darwinian evolution?
Depends on what sort of "prerequisit" you mean. Yes in physical/material time (yes, you need something that can be selected). (FWIW I think what is true for darwinian evolutin is also true for "history" more generally -- once you have material substrate, you enter the realm of history (of which darwinian evolution is a specific sub-type). This is similar to how (in tihs wild/funny picture I have been painting here) "once you have computational substrate, you enter the realm of rationality/rational constraints start to have a hold on you.)
There is another sense in which I would not want to say that there is any particular hierarchy between natural/unnatural/rational constraints.
I like Godfrey-Smith's picture here useful in that it reminds us that we are able to say both a) what the paradigmatic ("pure") case looks like, and also that b) most (/all?) actual example will not match the fully paradigmatic case (and yet be shaped to different extends by the logic that is illustrated in the paradigmatic case). So in our picture here, an actual agent will likely be shaped by rational/natural/unnatural constraints, but my none of them in a maximalist/pure sense.
assume you just have some sensory input (pre any filter/ontology you have specific reason to trust that would help you organize/make sense of that sensory input). There is a question how you could, from this place, make any valid inference. What I'm trying to say with the "thinghood" constraint is that, the fact that you're experiencing any sensory input at all implies you must have some "sustained existence" - you must endure for more than just a single moment. In other words, you must be a "thing" [...]. But that fact allows you to back out something - it becomes your "initial ground to stand on" from which you can "bootstrap" up.
Hm, this kind of proto-cognitive inference ([I get any input] -> [I am a stable "thing"] -> [I stand in a specific kind of relationship to the rest of the world]) feels a bit too cerebral to expect from a very simple... thing that only recently acquired stable thinghood.
The way I see it is:
A proto-thing* implements some simple algorithm that makes it persist and/or produce more of itself (which can also be viewed as a form of persistence). Thinghood is just the necessary foundation that makes any kind of adaptive process possible. I don't see the need to invoke ontologies at this point. I haven't thought about it much but the concept of ontology feels to me like implying a somewhat high-ish level of complexity and while you can appropriate ontology-talk for very simple systems, it's not very useful and adds more noise than clarity to the description.
---
* By "proto-thing", I mean "a thing that did not evolve from other things but rather arose ~spontaneously from whatever". I suspect there is some degree of continuity-with-phase-transitions in thinghood but let's put that aside for now.
While I agree it sounds cerebral when we talk about it, I don't think it has to be -- I think there is some not-unfounded hope that FEP is mathematizing exactly that: thinghood implies that and puts some constraints on how "the internal states (or the trajectories of internal states) of a particular system encode the parameters of beliefs about external states (or their trajectories)."
Also, it's worth reminding ourselves: it's not really MUCH we're getting here - like the FEP literature sounds often quite fancy and complex, but the math alone (even if correct!) doesn't constrain the world very much.
(I think the comparison to evolutionary theory here is useful, which I believe I have talked to you about in person before: we generally agree that evolutionary theory is ~true. At the same time, evolutionary theory on its own is just not very informative/constraining on my predictions if you ask me "given evo theory, what will the offspring of this mouse here in front of us look like". It's not that evo theory is wrong, it just on its own that much to say about this question.)
While I agree it sounds cerebral when we talk about it, I don't think it has to be -- I think there is some not-unfounded hope that FEP is mathematizing exactly that: thinghood implies that and puts some constraints on how "the internal states (or the trajectories of internal states) of a particular system encode the parameters of beliefs about external states (or their trajectories)."
Hmm, IDK, maybe. I'll think about it.
Moving back to the beginning of the dialogue, the kick-off questions were:
I'd like to elicit some of your thoughts about value change in humans. What makes a specific case of value change (il)legitimate? How is that tied to the concepts of rationality, agency, etc? Once we're done with that, we can talk more generally about arguments for why the values of an agent/system should not be fixed.
The topics we've covered so far give some background/context but don't answer these questions. Can you elaborate on how you see them relate to value change (il)legitimacy, and value-malleable rationality?
Some thoughts/intuitions/generators: [though note that I think a lot of this is rehashing in a condensed way arguments I make in the value change problem sequence]
Thanks! I think it's a good summary/closing statement/list of future directions for investigation, so I would suggest wrapping it right there, as we've been talking for quite a while.
Sounds good, yes! Thanks for engaging :)