Independent AI safety researcher


The accumulation of knowledge

Wiki Contributions


Parable - Soryu destroyer of maps

I thought this was brilliant, actually. My favorite line is:

Of course, B wasn't in analysis paralysis, that would be irrational

In seriousness though, I don't actually see the monastic academy's culture as naturally contrary to the rationalist culture. Both are fundamentally concerned with how to cultivate the kind of mind that can reduce existential risk. Compared to mainstream culture, these two cultures are really very similar. There are some methodological differences, of course, and these details are important, but they are not that deep.

Knowledge is not just mutual information

First, an ontology is just an agents way of organizing information about the world...

Second, a third-person perspective is a "view from nowhere" which has the capacity to be rooted at specific locations...

Yep I'm with you here

Well, what's a 3rd-person perspective good for? Why do we invent such things in the first place? It's good for communication.

Yeah I very much agree with justifying the use of 3rd person perspectives on practical grounds.

we should be able to consider the [first person] viewpoint of any physical object.

Well if we are choosing to work with third-person perspectives then maybe we don't need first person perspectives at all. We can describe gravity and entropy without any first person perspectives at all, for example.

I'm not against first person perspectives, but if we're working with third person perspectives then we might start by sticking to third person perspectives exclusively.

Let's look at a different type of knowledge, which I will call tacit knowledge -- stuff like being able to ride a bike (aka "know-how"). I think this can be defined (following my "very basic" theme) from an object's ability to participate successfully in patterns.

Yeah right. A screw that fits into a hole does have mutual information with the hole. I like the idea that knowledge is about the capacity to harmonize within a particular environment because it might avoid the need to define goal-directedness.

Now we can start to think about measuring the extent to which mutual information contributes to learning of tacit knowledge. Something happens to our object. It gains some mutual information w/ external stuff. If this mutual information increases its ability to pursue some goal predicate, we can say that the information is accessible wrt that goal predicate. We can imagine the goal predicate being "active" in the agent, and having a "translation system" whereby it unpacks the mutual information into what it needs.

The only problem is that now we have to say what a goal predicate is. Do you have a sense of how to do that? I have also come to the conclusion that knowledge has a lot to do with being useful in service of a goal, and that then requires some way to talk about goals and usefulness.

The hope is to eventually be able to build up to complicated types of knowledge (such as the definition you seek here), but starting with really basic forms.

I very much resonate with keeping it as simple as possible, especially when doing this kind of conceptual engineering, which can become so lost. I have been grounding my thinking in wanting to know whether or not a certain entity in the world has an understanding of a certain phenomenon, in order to use that to overcome the deceptive misalignment problem. Do you also have go-to practical problems against which to test these kinds of definitions?

[Event] Weekly Alignment Research Coffee Time (12/06)

Today this link does not seem to be working for me, I see:

Our apologies, your invite link has now expired (actually several hours ago, but we hate to rush people).

I also notice that the date is still 10/25 so perhaps the event is not happening today?

An Unexpected Victory: Container Stacking at the Port of Long Beach

Thank you so much for writing this up, Zvi!

It's hard to actually be correct about the nature of the bottleneck in such a scenario, and harder still to find a workable solution. I suspect that a good part of the success of this effort was just that Ryan was actually correct about the nature of the problem and the nature of the solution. Beyond that, Ryan being head of Flexport probably helped a lot in convincing the initial signal boosters to trust his diagnosis and prescription, and then for the government folks to take the whole thing seriously. It's not just that he had a general-purpose platform, but that he had credibility in that particular industry.

Self-Integrity and the Drowning Child

But how exactly do you do this without hammering down on the part that hammers down on parts? Because the part that hammers down on parts really has a lot to offer, too, especially when it notices that one part is way out of control and hogging the microphone, or when it sees that one part is operating outside of the domain in which its wisdom is applicable.

(Your last paragraph seems to read "and now, dear audience, please see that the REAL problem is such-and-such a part, namely the part that hammers down on parts, and you may now proceed to hammer down on this part at will!")

Three enigmas at the heart of our reasoning

Thank you!

Well, I would just say that the significance of it for me comes from the connection between the conclusion "I am" and practical life. I like to remind myself that there is something that really matters, and that my actions really seem to affect it, and so I take "I am" to be a reminder of that.

Three enigmas at the heart of our reasoning

It's just that you end up in circular reasoning in that case, because you have to start with the view that things that have worked in the past will continue to work in the future, then you see that this principle itself has worked in the past, then on the basis of the view you already started with as a premise you conclude that therefore this view that has worked in the past (that things that have worked in past will continue to work in the future) will continue to work in the future.

It's like if I would claim to you that things that have never worked in the past will tend to work in the future, and you ask why, and I say, well, because this view has never worked in the past, therefore it will work in the future. In order to reach that conclusion I had to start out by assuming the thing itself.

Interested in your thoughts.

Three enigmas at the heart of our reasoning

Yeah thank you for sharing these thoughts.

I have not really resolved these questions to my own satisfaction, but the thing that seems clearest to me is to really notice when these doubts are become a drag on energy levels and confidence, and, if they are, to carve out a block of time to really turn towards them in earnest.

Three enigmas at the heart of our reasoning

Yeah, these are definitely instances of the problem of the criterion. I actually had a link to your post in the original version of this post but somehow it got edited out as I was moving things around before publishing.

Load More