Independent AI safety researcher


The accumulation of knowledge


Knowledge is not just precipitation of action

Well most certainly yes, but what does that actually look like at the level of physics? How do I determine the extent to which my robot vacuum is forming beliefs that pay rent in the form of anticipated experiences? And most importantly, what if I don't trust it to answer questions truthfully and so don't want to rely on its standard input/output channels?

Knowledge is not just digital abstraction layers

Yes, good point, I overlooked this. I had thought that digital abstraction layers were a neat solution to the self-knowledge issue but they actually are not. Thank you for the note!

Knowledge is not just digital abstraction layers

Yep, agreed.

These are still writing that I drafted before we chatted a couple of weeks ago btw. I have some new ideas based on the things we chatted about that I hope to write up soon :)

Problems facing a correspondence theory of knowledge

Well here is a thought: a random string would have high Kolmogorov complexity, as would a string describing the most fundamental laws of physics. What are the characteristics of the latter that conveys power over one's environment to an agent that receives it, that is not conveyed by the former? This is the core question I'm most interested in at the moment.

Problems facing a correspondence theory of knowledge

Well yes I agree that knowledge exists with respect to a goal, but is there really no objective difference an alien artifact inscribed with deep facts about the structure of the universe and set up in such a way that it can be decoded by any intelligent species that might find it, and an ordinary chunk of rock arriving from outer space?

Problems facing a correspondence theory of knowledge

I very much agree with the emphasis on actionability. But what is it about a physical artifact that makes the knowledge it contains actionable? I don't think it can be simplicity alone. Suppose I record the trajectory of the moon over many nights by carving markings into a piece of wood. This is a very simple representation, but it does not contain actionable knowledge in the same way that a textbook on Newtonian mechanics does, even if the textbook were represented in a less simple way (say, as a PDF on a computer).

Problems facing a correspondence theory of knowledge

Thank you for this comment duck_master.

I take your point that it is possible to extract knowledge about human affairs, and about many other things, from the quantum structure of a rock that has been orbiting the Earth. However, I am interested in a definition of knowledge that allows me to say what a given AI does or does not know, insofar as it has the capacity to act on this knowledge. For example, I would like to know whether my robot vacuum has acquired sophisticated knowledge of human psychology, since if it has, and I wasn't expecting it to, then I might choose to switch it off. On the other hand, if I merely discover that my AI has recorded some videos of humans then I am less concerned, even if these videos contain the basic data necessary to constructed sophisticated knowledge of human psychology, as in the case with the rock. Therefore I am interested not just in information, but something like action-readiness. I am referring to that which is both informative and action-ready as "knowledge", although this may be stretching the standard use of this term.

Now you say that we might measure more abstract kinds of knowledge by looking at what an AI is willing to bet on. I agree that this is a good way to measure knowledge if it is available. However, if we are worried that an AI is deceiving us, then we may not be willing to trust its reports of its own epistemic state, or even of the bets it makes, since it may be willing to lose money now in order to convince us that it is not particularly intelligent, in order to make a treacherous turn later. Therefore I would very much like to find a definition that does not require me to interact with the AI through its input/output channels in order to find out what it knows, but rather allows me to look directly at its internals. I realize this may be impossible, but this is my goal.

So as you can see, my attempt at a definition of knowledge is very much wrapped up with the specific problem I'm trying to solve, and so any answers I arrive at may not be useful beyond this specific AI-related question. Nevertheless, I see this as an important question and so am content to be a little myopic in my investigation.

Agency in Conway’s Game of Life

Thank you for this thoughtful comment itaibn0.

Matter and energy and also approximately homogeneously distributed in our own physical universe, yet building a small device that expands its influence over time and eventually rearranges the cosmos into a non-trivial pattern would seem to require something like an AI.

It might be that the same feat can be accomplished in Life using a pattern that is quite unintelligent. In that case I am very interested in what it is about our own physical universe that makes it different in this respect from Life.

Now it could actually be that in our own physical universe it is also possible to build not-very-intelligent machines that begin small but eventually rearrange the cosmos. In this case I am personally more interested in the nature of these machines than in "intelligent machines", because the reason I am interested in intelligence in the first place is due to its capacity to influence the future in a directed way, and if there are simpler avenues to influence in the future in a directed way then I'd rather spend my energy investigating those avenues than investigating AI. But I don't think it's possible to influence the future in a directed way in our own physical universe without being intelligent.

to solve the control problem in an environment full of intelligence only requires marginally more intelligence at best

What do you mean by this?

the solution to the control problem may even be less intelligent than the structures it competes against, and make up for that with hard-coded solutions to NP-hard problems in military strategy.

But if one entity reliably outcompetes another entity, then on what basis do you say that this other entity is the more intelligent one?

Knowledge is not just map/territory resemblance

Thank you for the kind words Jemist.

Yeah I'm open to improvements upon the use of the word "knowledge" because you're right that what I'm describing here isn't quite what either philosophers or cognitive scientists refer to as knowledge.

Yes knowledge-accumulating systems do seem to be a special case of optimizing systems. It may be that among all optimizing systems, it is precisely the ones that accumulate knowledge in the process of optimization that are of most interest to us from an alignment perspective, because knowledge-accumulating optimizing systems are (perhaps) the most powerful of all optimizing systems.

Knowledge is not just map/territory resemblance

Dang, the images in this post are totally off. I have a script that converts a google doc to markdown, then I proofread the markdown, but the images don't show up in the editor, and it looks like my script is off. Will fix tomorrow.

Update: fixed

Load More