I'm not sure how obvious the following is to people, and it probably is obvious to most of the people thinking about FAI. But just thought I'd throw out a summary of it here anyway, since this is the one topic that makes me the most pessimistic about the notion of Friendly AI being possible. At least one based heavily on theory and not plenty of experimentation.

A mind can only represent a complex concept X by embedding it into a tightly intervowen network of other concepts that combine to give X its meaning. For instance, a "cat" is playful, four-legged, feline, a predator, has a tail, and so forth. These are the concepts that define what it means to be a cat; by itself, "cat" is nothing but a complex set of links defining how it relates to these other concepts. (As well as a set of links to memories about cats.) But then, none of those concepts means anything in isolation, either. A "predator" is a specific biological and behavioral class, the members of which hunt other animals for food. Of that definition, "biological" pertains to "biology", which is a "natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy". "Behavior", on the other hand, "refers to the actions of an organism, usually in relation to the environment". Of those words... and so on.

It does not seem likely that humans could preprogram an AI with a ready-made network of concepts. There have been attempts to build knowledge ontologies by hand, but any such attempt is both hopelessly slow and lacking in much of the essential content. Even given a lifetime during which to work and countless of assistants, could you ever hope to code everything you knew into a format from which it was possible to employ that knowledge usefully? Even a worse problem is that the information would need to be in a format compatible with the AI's own learning algorithms, so that any new information the AI learnt would fit seamlessly to the previously-entered database. It does not seem likely that we can come up with an efficient language of thought that can be easily translated into a format that is intuitive for humans to work with.

Indeed, there are existing plans for AI systems which make the explicit assumption that the AI's network of knowledge will develop independently as the system learns, and the concepts in this network won't necessarily have an easy mapping to those used in human language. The OpenCog wikibook states that:

Some ConceptNodes and conceptual PredicateNode or SchemaNodes may correspond with human-language words or phrases like cat, bite, and so forth. This will be the minority case; more such nodes will correspond to parts of human-language concepts or fuzzy collections of human-language concepts. In discussions in this wikibook, however, we will often invoke the unusual case in which Atoms correspond to individual human-language concepts. This is because such examples are the easiest ones to discuss intuitively. The preponderance of named Atoms in the examples in the wikibook implies no similar preponderance of named Atoms in the real OpenCog system. It is merely easier to talk about a hypothetical Atom named "cat" than it is about a hypothetical Atom (internally) named [434]. It is not impossible that a OpenCog system represents "cat" as a single ConceptNode, but it is just as likely that it will represent "cat" as a map composed of many different nodes without any of these having natural names. Each OpenCog works out for itself, implicitly, which concepts to represent as single Atoms and which in distributed fashion.

Designers of Friendly AI seek to build a machine with a clearly-defined goal system, one which is guaranteed to preserve the highly complex values that humans have. But the nature of concepts poses a challenge for this objective. There seems to be no obvious way of programming those highly complex goals into the AI right from the beginning, nor to guarantee that any goals thus preprogrammed will not end up being drastically reinterpreted as the system learns. We cannot simply code "safeguard these human values" into the AI's utility function without defining those values in detail, and defining those values in detail requires us to build the AI with an entire knowledge network. On a certain conceptual level, the decision theory and goal system of an AI is separate from its knowledge base; in practice, it doesn't seem like this would be possible.

The goal might not be impossible, though. Humans do seem to be pre-programmed with inclinations towards various complex behaviors which might suggest pre-programmed concepts to various degrees. Heterosexuality is considerably more common in the population than homosexuality, though this may have relatively simple causes such as an inborn preference towards particular body shapes combined with social conditioning. (Disclaimer: I don't really know anything about the biology of sexuality, so I'm speculating wildly here.) Most people also seem to react relatively consistently to different status displays, and people have collected various lists of complex human universals. The exact method of their transmission remains unknown, however, as does the role that culture serves in it. It also bears noting that most so-called "human universals" are actually cultural as opposed to individual universals. In other words, any given culture might be guaranteed to express them, but there will always be individuals who don't fit into the usual norms.

See also: Vladimir Nesov discusses a closely related form of this problem as the "ontology problem".

New Comment
24 comments, sorted by Click to highlight new comments since: Today at 9:17 PM
[-]Roko14y180

A mind can only represent a complex concept X by embedding it into a tightly intervowen network of other concepts that combine to give X its meaning.

I'm going to object right there. A mind can represent a concept as a high-level regularity of sensory data. For example, "cat" is the high level regularity that explains the sensory data obtained from looking at cats. Cats have many regularity properties: they are solid objects which have a constant size and a shape that varies only in certain predictable ways. There is more than one cat in the world, and they have similar appearances. They also behave similarly.

This "concept-as-regularity" idea means that you don't have a symbol grounding problem, you don't have to define the semantics of concepts in terms of other concepts, and you don't have the problem of having to hand-pick an ontology for your system; rather, it generates the ontology that is appropriate for the world that it sees, hears and senses.

Of course, you're still taking sensory inputs as primitives. How do you then evaluate changes to your sensory apparatus?

[-]Roko14y20

In the most basic case, simply ignore the possibility that this can happen.

In the more advanced case, I would say that you need to identify robust features of external reality using the first sensory apparatus you have. I.e. construct an ontology. Once you have that, you can utilize a different set of sensory apparatus, and note that many robust features of external reality manifest themselves as an isomorphic set of regularities in the new sensory apparatus.

For example, viewing a cat through an IR camera will not yield all and only the regularities that we see when looking at a cat through echo-location or visible light. But there will be a mapping, mediated by the fact that these sensor systems are all looking at the same reality.

In the simplest case, the initial agent doesn't allow changes in its I/O construction. Any modified agent would be a special case of what the initial agent constructs in environment, acting through the initial I/O, using the initial definition of preference expressed in terms of that initial I/O. Since the initial agent is part of environment, its control over the environment allows, in particular, to deconstruct or change the initial agent, understood as a pattern in environment (in the model of sensory input/reaction to output, seen through preference).

Yup. And for preference, it's the same situation, except that there is only one preference (expressed in terms of I/O) and it doesn't depend on observations (but it determines what should be done for each possible observation sequence). As concepts adapt to actual observations, so could representations of preference, constructed specifically for efficient access in this particular world (but they don't take over the general preference definition).

I agree with the "concept as regularity" concept. You can see that in how computers use network packets to communicate with each other. They don't define a packet as a discrete message from another computer, they just chop it up and process it according to its regularities

This leads to problems trying to point at humans in an AI motivational system though. Which you have to build yourself.... The problem is this. Starting at the level of visual and audio input signals, build a regularity parser that returns a 1 when it apprehends a human, and 0 when it sees apprehends something else. You have to do the following, future proof it so it recognises post/trans humans as humans (else if might get confused when we seem to want to wipe ourselves out). Make sure it is not fooled by pictures, mannequins, answer phones, chat bots.

Basically you have to build a system that can abstract out the computational underpinning of what it means to be human, and recognise it from physical interaction. And not just any computational underpinning, as physics is computational there is tons of physics of our brains we don't care about, such as exactly how we get different types of brain damage from different types of blunt trauma. So you have to build a regularity processor that abstracts what humans think are important about the computational part of humans.

If you understand how it does this, you should be able to make uploads.

We develop an understanding of what it means to be human, through interactions with humans. With a motivational system that can be somewhat gamed by static images and simulations, but one we don't trust fully. This however leads to conflicting notions about humanity. Whether uploads are humans or not, for example. So this type of process should probably not be used for something that might go foom.

[-][anonymous]14y00

I've kind of wanted to write about the concept-as-regularity thing for a while, but it seems akrasia is getting the best of me. Here's a compressed block of my thoughts on the issue.

Concept-as-regularity ought to be formalized. It is possible to conclude that a concept makes sense under certain circumstances involving other existing concepts that are correlated with no apparent determining factor. Since a Y-delta transformation on a Bayesian network looks like CAR, I'm guessing that the required number of mutually correlated concepts is three. Formalizing CAR would allow us to "formally" define lots of concepts, hopefully all of them. Bleggs and rubes are a perfect example of what CAR is useful for.

[-]Roko14y00

OK now I see what a Y-Delta transform is, but I doubt that anything that simple is the key to a rigorous definition of "concept as regularity". Better, see the paper "The discovery of structural form" By Charles Kemp and Joshua B. Tenenbaum.

[-]Roko14y00

what's a Y-delta transformation ?

[-]Roko14y00

Whilst it would be intellectually pleasing if this were the concept that Warrigal is referencing, I doubt it.

I didn't think it was the electrical engineering trick of turning a star-connected load into a triangle-connected one, but on further reflection, we are talking about a network...

The electrical engineering trick was several decades before Yang and Baxter and has its own wikipedia entry.

[-]Roko14y10

But the nature of concepts poses a challenge for this objective. There seems to be no obvious way of programming those highly complex goals into the AI right from the beginning

You don't have to.

The idea is to give the AI a preference that causes it to want to do what [certain] humans would want to do, even though it doesn't know what that will turn out to be.

The challenge is to give it enough information to unambiguously point it at those humans, so that it extrapolates our volitions, rather than, say, those of our genes (universe-tiled-with-your-DNA failure mode) or of our more subconscious processes. Key to this is getting it to identify a physical instantiation of an optimizing agent.

Key to this is getting it to identify a physical instantiation of an optimizing agent.

Here, have a functional upload, given as a lambda term. The main problem is what to do with it, not how to find one. Eventually we'll have uploads, but it's still far from being clear how to use them for defining preference. Recognizing a person without explicit uploading is a minor problem in comparison (though it's necessary for aggregation from whole non-uploaded humanity).

You misunderstand the post. The problem is that the concepts themselves, which you need to use to express the goals, will change in meaning as the AI develops.

I didn't notice this post at first, but it's really good. Very important, a critical problem with the FAI plan.

While I am not particularly optimistic about the creation of an FAI I say:

  • You don't need to make a particularly advanced ontology to create an AI. By this I mean that while complex, the AI need not come built with an ontology that represents even all of current human knowledge, let alone potential ontological breakthroughs made by future humans.
  • A GAI could maintain its original goal system under self improvement even if it makes ontological breakthroughs.
  • I would never trust an AI to have a full human goal system, or the attached ontology necessary to represent it.
  • The ontology of a pre-foom FAI would not be like the general human one. It would be simpler and clearer, with a mechanism to create whatever further ontology necessary to represent the values from an appropriate reference.
  • A super-intelligence can figure out ontological stuff better than a human. The 'only' problem (for the AI creators) is getting to to a system that has a simplified (less incoherent) version of the goal system of the creators that can self improve without goal change. Where 'without goal change' includes "don't destroy all of that gooey grey stuff from which you need to get a whole heap more of the detailed information about your values!!!"

Please justify your claims (particularly #2).

(Only slightly less briefly)

* You don't need to make a particularly advanced ontology to create an AI. By this I mean that while complex, the AI need not come built with an ontology that represents even all of current human knowledge, let alone potential ontological breakthroughs made by future humans.

Human ontologies are complex, redundant and outright contradictory at times. Not only is some of it not needed to create an AI it would be counter-productive to include it.

  • A GAI could maintain its original goal system under self improvement even if it makes ontological breakthroughs.

When AI comes to model human preferences and the elements of the universe which are most relevant to fulfilling them they need not interfere at all with the implementation of the AI itself. It's just a complex form of data and metadata to keep in mind. When it comes to things that would more fundamentally influence the direct operation of the AI goals it can ensure that any alterations do not contradict the old version or do so only to resolve a discovered contradiction in whatever the sanest way possible is.

  • I would never trust an AI to have a full human goal system, or the attached ontology necessary to represent it.

Humans suck compared to superintelligences. They even suck at knowing what they want. I'd rather tell a friendly superintelligence to do what I want it to do rather than try to program my goals into it. Did I mention that it is smarter than me? It can even emulate me and ask em-me my goals that way if it hasn't got a better option. There is no downside to getting the FAI to do it for me. If it isn't friendly then....

  • The ontology of a pre-foom FAI would not be like the general human one. It would be simpler and clearer, with a mechanism to create whatever further ontology necessary to represent the values from an appropriate reference.

Humans suck at creating ontologies. Less than any other species I know but they still suck. I wouldn't include stupid parts in a FAI, that'd make it particularly hard to prove friendly. But it would naturally be able to look at humans and figure out any necessary stupid parts itself.

  • A super-intelligence can figure out ontological stuff better than a human. The 'only' problem (for the AI creators) is getting to to a system that has a simplified (less incoherent) version of the goal system of the creators that can self improve without goal change. Where 'without goal change' includes "don't destroy all of that gooey grey stuff from which you need to get a whole heap more of the detailed information about your values!!!"

That is rather dense, I'll admit. But the gist of the reasoning is there.

Hardcoding a knowledge ontology that would include e.g. all concepts humans have ever thought of is theoretically possible, since those concepts are made up of a finite amount of complexity. It's just that this would take so very long...

Anyway, I wouldn't rule out that a sufficient knowledge ontology for a FAI could be semi-manually constructed in a century or two, or perhaps a few millenia. It is also theoretically possible that all major players in the world come to an agreement that until then, very strong measures need to be taken to prevent anyone from building anything-that-could-go-UFAI.

I of course wouldn't claim this probability to be particularly high.

You might actually be able to do some back-of-the-envelope calculations on this. Humans are slow learners, and end up with reasonable ontologies in a finite number of years. By this old estimate, humans learn two bits worth of long term memory content per second. Assuming that people learn with this rate during 16 hours of waking time each day of their life, this would end up something like 32 megabytes of accumulated permanent memory for a 13-year old. 13-year olds can have most of the basic world ontology fixed, and that's around the age where we stop treating people as children who can be expected to be confused about obvious elements of the world ontology as opposed to subtle ones.

Hand-crafting a concept kernel that compresses down to that order of magnitude doesn't seem like an impossible task, but it's possible there's something very wrong with the memory accumulation rate estimation.

The 32 megabytes in question should be added to any pre-programmed instincts.

Yes. Those would go into the complexity bound for the human genome, since the genome is pretty much the only information source for human ontogeny. The original post suggested 25 MB, which apparently turned out to be too low. If you make the very conservative assumption that all of the human genome is important, I think the limit is somewhere around 500 MB. The genes needed to build and run the brain are going to be just a fraction of the total genome, but I don't know enough biology to guess at the size of the fraction.

Anyway, it looks like even in the worst case the code for an AGI that can do interesting stuff out of the box could fit on a single CD-ROM.

Also, by that time, people might be enough more complex that hand-coding all the concepts 21st century people can hold will be an interesting historical project, but not enough for a useful FAI.