Riffing on the agent type

2Alexander Gietelink Oldenziel

7Quinn

2Alexander Gietelink Oldenziel

New Comment

3 comments, sorted by Click to highlight new comments since: Today at 9:14 AM

- I've been very distressed thinking that instrumental and epistemic parts are
**not**cleanly separable, and that entire is-ought gap or humean facts-values is a grade school story or pedagogically noble lie https://www.lesswrong.com/posts/kq8CZzcPKQtCzbGxg/quinn-s-shortform?commentId=fdCTjtJgucYP9Xza4 - I got severely burnt out from exhaustion not long after writing this, and one of the reasons was the open games literature lol. But good news! I was cleaning out old tabs on my browser and I landed on one of those papers, and it all made perfect sense instantly! I'm more convinced than I initially suspected that the open games community has a ton to offer.

Much is owed to Diffractor for Giry-pilling me at Alignable Structures, I had been struggling with type-driven expected value previously.Epistemic status: took a couple days off from my master plan to think about John's selection theorems call to action.We would like a type signature of agency. Scott Garrabrant provides (A→O)→A as a first approximation. You can choose one of two ideas here: 1. that an agent simply takes a

beliefabout how actions A turn into outcomes O and returns a recommended action, or 2. that an agent takes underlying configurations of reality (containing information about how actions lead to outcomes) andtends toperform certain actions. Notice that O happens to be for "outcome", "observation", and even "ontology", which is nice. This signature is widely discussed in the monad literature.Scott wrote that → primarily means causal influence and secondarily means functions. I will be mostly ignoring the causal influence idea, and I think instead of thinking of the signature from an objective perspective of it being a transcription of the underlying reality, I want to think of it from a subjective perspective of it being an assistant for implementation engineers. I think we should take a swing at being incredibly straightforward about what we mean by the type signature of agency: when I say that a type τ is the type signature of agency, I mean that if we have programs that are admitted by τ then those programs are doing all the things that interest me about agents (i.e., at τ=(A→B)→A, if we instantiate particular non-atomic propositions A and B that interact with the outside world in such a way that we can obtain proofs of (A→B)→A (which we can't do in general) in some way, then those proofs are doing all the things that interest me about agents).

In my view, the first idea (involving "belief") can be called a

subjective interpretationof the type signature, and we shall explore some adjustments to make this story better, while the second idea (involving "base reality") can be called anobjective interpretationof the type signature, and we shall not explore philosophical controversies around saying that a type is "in" reality rather than in a model.I will ultimately conclude that I am not equipped to flesh out the objective interpretation, and give a subjective interpretation such that an agent is not one selection function, but a pair of selection functions. In particular, A=InsA⊕EpiA (an agent is made up of an instrumental part and an epistemic part).

In the post, heading number one is an infodump about stuff I've been reading, and setup of some tooling. Heading number two is applications to agency.

## Preliminaries

implementation of termsand → denotessignature of types. x↦e is an alternative to λx.e.monadwe mean that it comes to us equipped with a function η:X→MX (called areturn) and a function μ:MMX→MX (called aflatten) which agree to some laws.Example: let M:=list setting η to the construction of a singleton and μ to the removal of nesting information.Exercise: fixing an A:Type, one of M:=B↦BA or M:=B↦AB forms a monad. Pick one and set its η and μ (Solution^{[1]}).## Selection and continuation

The agent type is widely discussed in the monad literature.

Fixing an outcome type S, JS:=X↦(X→S)→X:Type→Type is called the

selection monad, and its friend KS:=X↦(X→S)→S:Type→Type is called thecontinuation monad.## Remark: quantifiers are continuations

## Preliminaries

The story of B-valued or B-interpreted logics goes like this. For any X:Type,

(∀x:X)(∃x:X):KBX

In other words, a quantifier takes a predicate (typed X→B) and returns a valuation of the predicate under different conditions. ∀x:X is the element of KBX that says "the predicate is true all over X", ∀x:X,kx (or we may write it point-free as ∀k) is true if and only if kx is always true regardless of x. ∃x:X is the element of KBX that says "the predicate is true at least once over X", the point-free ∃k is true if and only if you can provide at least one x such that kx is true.

The literature likes to call continuations

generalized quantifiers, where your "truth values" can take on arbitrary type. The story of quantifiers can be updated to P for a richer type of propositions such that not everything is decidable.## Exercises:

^{[2]}).^{[3]}).## Remark: distributions are a special case of generalized quantifiers

## Preliminaries

constantfunction ~y:=x↦y:A→B by throwing out the x.monotonicwhen, having a ≤A and a ≤B, ∀xy:A,x≤Ay→αx≤Bαy.linearwhenever ∀kl:F,∀xy:A,α(kx+ly)=kαx+lαy.## Consume a valuation and produce an expectation

A particular way of strengthening or filtering KR (quantifiers generalized to valuations in R) is to require linearity, monotonicity, and the sending of constant functions to a neutral scalar. For arbitrary types A,B and for types C equipped with some multiplicative structure involving a neutral, we will write BA≤⊸C to describe the functions BA→C but only keeping the ones that are monotonic, linear, and that send constants in BA to the multiplicative neutral in C (conventionally, ⊸ pronounced "lollipop" or "lolli" denotes linearity). Letting R play the roles of B and C, define Δ:=X↦RX≤⊸R:Type→Type

In other words, a distribution is just a continuation term that knows how to turn a valuation (an X→R, i.e. a random variable) into an expectation (where the expectation abides linearity, monotonicity, and the sending of constants to 1).

∀X:Type,∀μ,Eμ:=α↦∫Xαdμ:ΔX

where I'm being lazy about the measure theory needed to actually compute terms, however, we see that measure theory doesn't really emerge at the type level.

I'm thinking of distributions as a subset of these R-valued quantifiers because I want to eventually think about utilities, and I'm still pretty sure the utility codomain is going to be R all the time.

## Δ forms a monad

The settings of η and μ along with the lawfulness proofs are in this

`coq`

file, written a few weeks ago before I knew anything about the selection and continuation literature. (This is not surprising, as we knew that KR forms a monad, and the substitution of the second → for ≤⊸ only deletes maps and doesn't add any potential violators).## Remark: convert selections into continuations/quantifiers

∀AB:Type,¯.:=ϵ↦k↦k(ϵk):JBA→KBA

In other words, if ϵ is a selection then ¯ϵ is a continuation.

## Attainability

Presume a AB:Type. Suppose I have a k:KBA. k is called

attainablewhen it's preimage under ¯. is nonempty. In other words, k is attainable if and only if ∃ϵ:JBA,∀α:BA,kα=α(ϵα). In that case, we may say "ϵ attains k".Notice that from the existence half of the functionality predicate, we get a free existence proof of a continuation/quantifier for every selection. To believe that some continuations are unattainable is to believe that ¯. is not surjective.

## Exercise

^{[4]}## Wrapping the codomain

Fix a F:Type→Type and a S:Type. Define

JFS:=X↦(X→S)→FX:Type→Type KFS:=X↦(X→S)→FS:Type→Type

## Example: powerset

Denote P as the function that confiscates a type and rewards the powerset of that type. In other words, P:=X↦X→B:Type→Type (where an α:PX is interpreted x"∈"α if and only if αx=true).

We call the items of JPS

multi-valued selectionsand items of KPSmulti-valued quantifiers.## Exercise (harder than previous)

^{[5]}## Wrapping the codomain of the domain

We may additionally like to use maps F:Type→Type to goof off with transforming the codomain of the input map.

J(F)S:=X↦(X→FS)→X:Type→Type K(F)S:=X↦(X→FS)→S:Type→Type

## Exercise

## Wrapping the whole domain

Having maps F:Type→Type, and since X→S:Type, we also might enjoy transforming the whole input map type.

J[F]S:=X↦F(X→S)→X:Type→Type K[F]S:=X↦F(X→S)→S:Type→Type

## Modifying the agent signature

Recall the agent interpretation of selection. We fix an outcome type O and an action type A and we reason about JOA=(A→O)→A. Recall that there are two cases: a

subjectivecase in which items A→O arebeliefs, and anobjectivecase in which items A→O areactual configurations of reality. In the subjective case, an agent turns a model of reality into a recommended action (the term hardcodes its notion of utility or whatever). In the objective case, the world has configurations, and an agent can be trusted to tend toward the actual configuration over time, using it to (again relying on hardcoded utility data) select actions.## Investigation: continuation is to Δ as selection is to what?

We obtained Δ by replacing the rightmost → in the definition of KRX with my custom ≤⊸. Let's goof around with performing the same replacement in JRX.

Δ∗:=X↦RX≤⊸X:"Type"→Type

Recall that ≤⊸ implies that it's codomain supports linearity, monotonicity, and multiplicative neutrality, so we know that the domain of Δ∗ isn't "really" just Type (hence the scare quotes), whereas the domain of Δ was truly the unconstrained type Type. So it may be difficult now to be sure of the preservation of monadicity.

## Preliminaries

## Rambling about Δ∗

How do we interpret this? In the agent case, actions are playing the role of X, which immediately suggests that we'll only have the class of continuous action spaces, so we can try R. But Δ∗R=RR≤⊸R=ΔR, which feels maybe problematic or vacuous. Possibly problematic, because I don't know how the theory of random variables adjusts to the bare real line (as opposed to a collection of subsets). Possibly vacuous, because I don't know any particular terms typed R→R (other than x↦x or ones with fairly strong conditions like increasingness) that I would expect to correspond with some foggy coherence notion for valuations in the back of my mind. Moreover, what should we think of collapsing the very distinction between selection and continuation, by setting S=X? (X→X)→X isn't provable in the logic interpretation (unless I'm missing some coinductive black magic resolving loops), which is a hint that we're barking up the wrong tree. My gut isn't telling me Δ∗[0,1] would be any better.

We could of course support the ≤⊸ requirements on the codomain by putting a monoidal preorder on BX (namely setting P:=BX, ≤:=⊆, ϵ:=X, and ⊗:=∩), which wouldn't work for entirely arbitrary X:Type but would work if you could interpret the scaling of a subset (like X is a single suit out of a deck of cards, the valuation ν of a subset is the total number of pips across all the cards in the subset, and scalar k hits it by doing some operation on that valuation, like k:=p↦⌊|kνp|⌋). Fix an X that you can interpret in this way. Then, try Δ∗BX=((X→B)→R)≤⊸X→B. In other words, if I have an X-generated multi-valued "selection distribution" E:Δ∗BX, then for every valuation of a subset ν:BX→R, Eν is a kind of expected subset, or it's something the agent can proactively search for like argmin or argmax. Perhaps you could even interpret/implement it like "if ν is my complete account of what a subset is worth to me, then E fixes an amount of optimization power I'm going to throw at steering the future into particular subsets over others, and Eν denotes the sort of place I would end up if I applied that much optimization to my values (insofar as landing at an actual optima implies that possibly unbounded optimization power was deployed)".

## Exercise

## What about ΔC∗X for metric spaces X?

Loosening up the pedantry a little, because the actual type-driven story would get too hairy, let's by fiat admit C[0,1]:Type, so we can take the subset of R[0,1] that just has continuous functions in it. You shall indulge me if I utilize C:Type→Type without properly saying that the domain is just the types interpretable as or isomorphic to uninterrupted intervals, whatever.

ΔC∗=X↦(X→R)≤⊸CX=X↦(X→R)≤⊸XC→R

A modus ponens with a little decoration with conditions like the linearity/monotonicity/sending constants to 1 and continuity. What does it mean?

It could mean the environment actually giving the agent a reward for taking action X, though it's a simpler story than the one in standard reinforcement learning theory, especially e.g. POMDPs.

## Lastly, Δ∗∘C

The idea of rigging "scalar multiplication" to my deck of cards semantics was uncomfortable. The following, however, has a perfectly natural notion of linearity (alongside order and the idea of a 1).

Δ∗∘C=X↦(CX→R)≤⊸CX=X↦((XC→R)→R)≤⊸XC→R

Selections over continuous functions (taking valuations of continuous functions as inputs and returning continuous functions as output) sounds like a kind of learning over "metavalues", when the continuous functions are interpreted as utilities, then the argmax:(Δ∗∘C)X knows how to take the utility of a utility function (which is metautility) and choose the one that maximizes metautility.

This of course restricts that the action type be equipped with a metric.

## Conjecture: attainability survives the transportation to the custom ≤⊸

∀X:"Type",¯.:Δ∗X→ΔX should be provable. Indeed, it's just a domain restriction on the original ¯., so this conjecture is in the bag.

## Investigation: J[Δ]O

J[Δ]S:=X↦Δ(X→O)→X:Type→Type

This isn't quite the subjective approach I'm looking for. Mapping from uncertainty over valuations to actions seems kinda from the perspective of social choice theory, where the difference in opinion across the population is captured by not being able to know a precise point estimate of a valuation, having to turn a distribution over valuations of actions into an action.

## Investigation: J(Δ)S

J(Δ)S:=X↦(X→ΔS)→X:Type→Type

This looks to me the most like "the agent turns models/beliefs into actions".

Let's unfold Δ.

J(Δ)S=X↦(X→(RX≤⊸R))→X:Type→Type

The general pattern of "terms such that the input is X into quantifiers and the output is X" might mean that terms are hardcoded predicates which can select values of X to get a desired result

dependingon whichever quantifier shows up. We will not work with the unfolded version in what follows.## Rescue attempt: the objective interpretation

In the objective interpretation of the type signature of agency, an agent is a term that turns a configuration reality could be in (specifically the information about how actions lead to outcomes) into an action.

In my rescue operation, objectivity is not pure: we will see that I've installed a subjectivity (i.e. learning) layer as an implementation detail. Think of it like the difference between a lemma and a theorem; at the lemma level, there's subjectivity, while if the theorem level doesn't open up black boxes it may not notice subjectivity. Put another way, the challenge of the rescue operation is to tell a compellingly full story (which ought to oblige the term to empiricism under uncertainty) without resorting to Δ.

The "lemma" will be a term ϕ:JROA. Its inputs OA→R are loss functions which come equipped with real-world data hardcoded into them. These loss functions make sense of the gap between a map and a territory, here focusing on action-output relations, i.e. they take a notion of how actions turn into outcomes and they score how accurate it is. Then for such a loss function l:OA→R, ϕl:A→O. (And if you want the function that constructs ls, you need ontology to describe that function's domain). Since this is the objective point of view, we interpret ϕl's codomain as the literal outcomes in the world, indeed ϕl

isthe gears by which perturbations from agents effect things. (Warning: here be monsters) if we say that in order to implement an agent you need to provide a ϕ, and ϕl describes the literal gears of the world and isn't a conditional forecast (like "our best guess at time t is that action x will transition the world into state (ϕtl)x"), then I don't see how an agent is remotely computational.Equipped like so, with any proof ϵ:JOA hardcoded by some humans or learned by some ML model, and some l:OA→R provided by a stakeholder/principal, (ϵ∘ϕ)l is the action the proof would like to take. But since ϕ is a blackbox, (ϵ∘ϕ)l is as good as an axiom, i.e., the blackboxness propagates out and it

can'tbe actually written at the low level. A configuration of the world ((ϕl)∘ϵ∘ϕ)l has a similar problem.It's plausible to me that infrabayesian physicalism or factored sets provide a way forward, but I'm not going to grok either of those today. (The first time I read "Saving Time" just as now, I was confused about "the future effects the past" because of the determinism/nondeterminism question, i.e. I get that forecasts of or distributions over the future effect the past, but I don't get how the actual future effects the past).

I'm marking this rescue operation as a

failure, owing to the restriction against invoking Δ.## Rescue: the subjective interpretation

## Preliminaries

pair types, of one constructor (namely (⋅,⋅):=a↦b↦(a,b):A→B→A×B) and two destructors (namely π1:=(a,b)↦a:A×B→A and π2:=(a,b)↦b:A×B→B), and we assume associativity of n-ary or nested products.## A subjective agent is a tuple with a protocol

In the subjective interpretation of the type signature of agency, an agent is a term that knows how to turn a

beliefabout how actions turn into outcomes into an action. Since beliefs are emphasized, and beliefs are uncertain, we will allow ourselves liberal use of the Δ operator. The following approach is based on the failed rescue of the objective interpretation.Fix a type A of actions and a type O of outcomes. We consider proofs ψ:N→JR(A→ΔO) where items f:A→ΔO are

conditional forecaststhat accept an action x and report, with uncertainty, a belief about what will happen if it does x. As the domain of ϕ above, for any t:N the domain of ψt isloss functions, each of which considers a conditional forecast f and scores it for calibration, accuracy, whatever. We write ψt instead of ψt.A stakeholder or principal encodes observations at time t of the world by hardcoding data into their choice of loss function by using the function L:N→(A→ΔO)→R, writing l=Lt instead of Lt.

Then implement the uncertain selection ϵ:J(Δ)OA. Notice that it's domain is conditional forecasts.

Then an agent is none other than A:=(ϵ,ψ):J(Δ)OA×(N→JR(A→ΔO)) with sensors and actuators, which interacts with the world via a protocol π which runs as follows

In other words, the agent calculates an action because it can turn loss functions which score conditional forecasts into a handpicked conditional forecast, and it can also turn conditional forecasts into handpicked actions. ψ hardcodes the procedure for doing bayesian updates, i.e. it has opinions about some beliefs being better than others. ϵ hardcodes (and hides) a utility function, i.e. it has opinions about some outcomes being better than others. Echoing a complex number, which is a real part and an imaginary part, we can view an agent A as an instrumental part and an epistemic part. While the complex numbers are equipped with some notion of "+" such that z:=π1z+π2z (real part plus imaginary part), I can make up a notion of "⊕" such that

∀L:N→(A→ΔO)→R,∀t:N,At:=((π2A)(t+1)(Lt))∘(π1A)∘(π2A)(t+1)(Lt)⊕(π1A)∘(π2A(t+1)(Lt))

(epistemic part "plus" instrumental part).

I played fast and loose with the

mutabilityand implicitdifferentiabilityof ψ. I think this is appropriate: any intuition about agents is that their beliefs change over time, even if corrigibility remains an open problem (in other words, epistemic part ought to be mutable even if the instrumental part (where the utility function is) is not).The abstract type signature of π is N→O, where here when we say a codomain is outcomes we mean that it's the literal world, not an implementational model of it, hence the signature being "abstract".

## Selection product?

In the literature there's a function J×S:=J1↦J2↦J×SJ1J2:JSA→JSB→JS(A×B). It's only defined between selections that share an inner target S, though, so it doesn't apply to A. Still, there might be some cleverness I haven't considered.

## Conclusion

We need more candidates for the type signature of agency. An obvious way to explore is to take the first candidate someone wrote down, make an incision, and poke its guts with various functions F:Type→Type.

A more complete story of agency, together with a protocol describing interactions with the world, is not a single selection but a pair of selections. The pair can be understood as an epistemic part and an instrumental part.

I'm aware that I at least partially took some steps toward reinventing the reinforcement learning theory wheel when I gave the protocol π, an alternative approach to this post would be to start with RL theory and see what notions of selection function are hanging around.

If we hammer out the dents in Δ∗ we get a really pretty notion of "turning agency into probability" (in the form of the function ¯. on a restricted domain), and plausibly also a characterization of the unreliability or impossibility of turning probability into agency (via the insurjectivity of ¯.).

What

aboutinterp? I think something like the searching for search could, if we're not totally and completely wrong about the pillars of the agency type signature direction, show us atonabout how ML naturally implements terms/proofs of things like JSX. A dope UX would be something like tactical programming not for creating terms/proofs, but for parsing out / identifying them in a big pile/soup of linear algebra. A fantasy world I'd like to live in is one where a prosaic/interp shop ships`neural-hoogle`

or`transformer-hoogle`

, a search engine that accepts type signatures and finds configurations of matrices and weights which, if you squint, count as proofs/terms of those types. To be abundantly clear, you can think of the following proof of JfloatA as the dumbest possible searchInsofar as the type

`A`

is enumerable. The hypothesis advanced by this post is that arbitrarily not-dumb search is constrained by the same type information as dumb search. Search isliterallya significant class of proofs of selection.The objective interpretation of the project of giving a type signature for agents seems a little borked right now, but that could change with increased understanding of factored sets or maybe infrabayesian physicalism.

Selections and continuations play a huge role in compositional game theory, which I'm starting to think provides a mean embedded agency story, though I haven't grokked it quite at the level of writing a post about it just yet.

## References

not super useful without an interactive session, nevertheless. ↩︎

∀X:Type,argmax:JRX ↩︎

∀X:Type,max:KRX ↩︎

argmax attains max, or ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯argmax=max. ↩︎

Presume some XS:Type and some K:KPSX. K is attainable if and only if ∃ε:JPSX,∀α:SX,∀x:X,(εα)x=true→(Kα)x=true. For multi-valued variants of max and argmax, we can check that the solution to exercise 2 transfers over to this setting. ↩︎