Alex_Altair

Sequences

Entropy from first principles

Comments

Oh, this thing I identify more with doing (and getting results from).

I only vaguely understand both Cartesian frames and «boundaries», but I want to take a shot at explaining what feels to me like some confusions in this post. Also a bunch of this might be invalidated by your disclaimer that this post was published before the final version of Crtich's boundaries sequence.

It seems to me that your work in Cartesian frames is about finding an ontology in which we can make progress on agent foundations in general, whereas Critch's concept of boundaries is about formalizing a necessary condition for identifying when an agent exists in a particular system.

While every boundary can be recast as a frame, the converse is not also true.

Indeed! We wouldn't want a theory of agents (or boundaries) to be applicable to a state in thermal equilibrium, for example, whereas we definitely want our ontology to be able to handle it.

Getting Past the Physical Frame

This whole section is actually a little confusing. It sounds like you're interpreting Critch's sequence to be saying that boundaries are inherently physical. But aren't all the definitions in terms of information theory? The nodes in the network are course-grained, but aren't we free to decide what about the world the are abstracted from?

This also further motivates frames over boundaries, as we might want to reason about multiple different frames simultaneously with no underlying factorization that multiplicatively-refines both of them.

It seems entirely possible to me for a formalization of boundaries to allow for two different but equally-valid boundaries to be drawn within the state.

 

Actually, now that I think about it, is there any reason that Critch's definition of boundaries isn't fully compatible with the ontology of factored sets?

Boundaries are information-theoretic and more objective in that they are (often) inter-subjectively observable just by counting bits of mutual information between variables, whereas preferences a subjective and observable only indirectly through behavior.

This feels wrong to me, but the feeling is a little fuzzy. Maybe I disagree with the emphasis or the framing, or something.

Preferences are instantiated in the world just like anything else. And they might be entirely visible; there's no reason you couldn't have a system whose preferences are on display (like reading the source code of a program). But I would grant that they're not usually on display, especially for humans, and so you do have to infer them through behavior (although that can include the person telling you what they are, or otherwise agreeably taking actions that are strong evidence of their preferences).

In contrast, boundaries are usually evident from the outside, even if they obscure what's happening on the inside. And if preferences are not directly visible, then they're probably inside a boundary, in which case the boundary is easier to detect and verify than the preferences.

Here I'm going to log some things I notice while reading, mostly as a way to help check my understanding, but also to help find possible errata.

 

In Definition part (a), you've got a whole lot of W-type symbols, and I'm not 100% sure I follow each of their uses. You use  a couple times which is legit, but it looks a lot like , so maybe it could be replaced with ?

See this comment for two errata with the different w's.

 

 denotes, for any world state  , the future of the Dirac (100% concentrated) distribution on the world state .

Maybe you could just say,  is shorthand for , since  will map  to the right thing of type . Then you can avoid bringing in the somewhat exotic Dirac delta function. Of course, that now means that  itself is not the first item in the resulting sequence. I'm not sure if you need that to be the case for later. But also, everything above is ambiguous about whether the argument to  was in the sequence anyway.

 

The character ⫫ doesn't render for me. (I could figure out what it was by pasting the unicode into google, but maybe it could be done with LaTeX instead?)

 

To formalize this, I want a collection of state spaces and maps, like so:

Is the following bulleted list missing an entry for ?

 

Each of these factorizations are assumed to be bijective, in the sense of accounting for everything that matters and not double-counting anything

I was wondering if you were going to say something like  and . It sounds like that's almost right, except that you allow the factors to pass through arbitrary functions first, as long as they're bijective. Is that right?

 

We say  is a good fit

You bring back  here, but I don't see the  doing anything yet. Might be better not to introduce it until later, to free up a bit of the reader's working memory.

 

See this comment for a broken link.

My best guess about the core difference between optimization and agency is the thing I said above about, "a utility function, which they need for picking actions with probabilistic outcomes".

An agent wants to move the state up an ordering (its optimization criterion). But an agent also has enough modelling ability to know that any given action has (some approximation of) a probability distribution over outcomes. (Maybe this is what you mean by "counterfactuality".) Let's say you've got a toy model where your ordering over states is A < B < C < D < E and you're starting out in state C. The only way to decide between [a 30% chance of B + a 70% chance of D] and [a 40% chance of A + a 60% change of E] is to decide on some numerical measure for how much better E is than D, et cetera.

Gradient descent doesn't have to do this at all. It just looks at the gradient and is like, number go down? Great, we go in the down direction. Similarly, natural selection isn't doing this either. It's just generating a bunch of random mutations and then some of them die.

(I'm not totally confident that one couldn't somehow show some way in which these scenarios can be mathematically described as calculating an expected utility. But I haven't needed to pull in these ideas for deconfusing myself about optimization.)

Mostly it's that I've found that, while trying to understand optimization, I've never needed to put "weights" on the ordering. (Of course, you always could map your ordering onto a monotonically increasing function.)

I think the concept of "trying" mostly dissolves under the kind of scrutiny I'm trying to apply. Or rather, to well-define "trying", you need a whole bunch of additional machinery that just makes it a different thing than (my concept of) optimization, and that's not what I'm studying yet.

I've also been working entirely in deterministic settings, so there's no sense of "how often" a thing happens, just a single trajectory. (This also differentiates my thing from Flint's.)

I haven't stopped working on the overall project. I do seem to have stopped writing and editing that particular sequence, though. I'm considering totally changing the way I present the concept (such that the current Intro post would be more like a middle-post) so I decided to just pull the trigger on publishing the current state of it. I'm also trying to get more actual formal results, which is more about stuff from the end of that sequence. But I'm pretty behind on formal training, so I'm also trying to generally catch up on math.

I mean, that makes sense according to their definition, I think I'm just defining the word differently. Personally I think defining "agent" such that gradient descent is an agent seems pretty off from the colloquial use of the word agent.

I'm curious how universal the value of "Undirected Time" is. I've found that I pretty much just... don't have "shower thoughts", at least not ones that are more frequently about what I've been working on. Similarly, I don't relate to the phenomenon of taking a break from a hard problem, coming back, and then suddenly making faster progress on it. And I especially don't relate to the phenomenon of having ideas come to me in dreams.

It's not obvious to me whether this is better or worse. On one hand I could be saving time by not having to wait for background processes (I don't feel like I get particularly stuck on problems anyway), but on the other hand maybe I'm just failing to use the background processes to do the cognition for me?

On the other other hand, I have to wonder if maybe I just do the equivalent of "undirected time", like, all the time? I feel like I pretty often switch what I'm thinking about between unrelated topics, or change scope, or wonder about whether things are connected in a certain way, etc. It feels like this might be satisfying the intended meaning of "undirected time", even if it doesn't feel undirected to me.

Okay, that's an interesting comparison. Maybe this will help; Yudkowsky's measure of optimization is a measure, like of how much it's happening, rather than the definition. Then the definition is "when a system's state moves up an ordering". Analogously, objects have length, and you can tell "how much of an object" there is by how long it is. And if there's no object, then it will have zero length. But that doesn't make the definition of "object" be "a thing that has length". Does that make sense?

I think you might be missing the intended meaning of Yudkowsky's measure. It's intended to be a direct application of Bayes. That means it's necessarily in the context of a prior. His measure of optimization is; under the belief that there is no optimization process present, how surprised would you be if the state ended up how far up the state ordering? And if there is an optimization process, then it will end up surprisingly far up relative to that. The stronger it is, the more surprising. We're not saying you do believe there is no optimizer. But then if you condition on their being an optimizer, and ask "how surprising is it that the optimizer does optimization?" then of course the surprise disappears. It's not that having more knowledge makes it objectively less of an optimizer, it's that it makes it subjectively less surprising.

Load More