Abstract: Sophisticated autonomous AI may need to base its behavior on fuzzy concepts that cannot be rigorously defined, such as well-being or rights. Obtaining desired AI behavior requires a way to accurately specify these concepts. We review some evidence suggesting that the human brain generates its concepts using a relatively limited set of rules and mechanisms. This suggests that it might be feasible to build AI systems that use similar criteria and mechanisms for generating their own concepts, and could thus learn similar concepts as humans do. We discuss this possibility, and also consider possible complications arising from the embodied nature of human thought, possible evolutionary vestiges in cognition, the social nature of concepts, and the need to compare conceptual representations between humans and AI systems.

I just got word that this paper was accepted for the AAAI-15 Workshop on AI and Ethics: I've uploaded a preprint here. I'm hoping that this could help seed a possibly valuable new subfield of FAI research. Thanks to Steve Rayhawk for invaluable assistance while I was writing this paper: it probably wouldn't have gotten done without his feedback motivating me to work on this.

Comments welcome. 

New to LessWrong?

New Comment
16 comments, sorted by Click to highlight new comments since: Today at 12:52 PM

I was immediately confused by the first two sentences in the abstract:

Sophisticated autonomous AI may need to base its behavior on fuzzy concepts that cannot be rigorously defined, such as well-being or rights. Obtaining desired AI behavior requires a way to accurately specify these concepts

We may need something that can't be done, but wait, we do require it, so I guess we better figure out how.

Are you making a distinction between defining and specifying?

If you just removed

that cannot be rigorously defined, the abstract reads perfectly sensibly and informatively.

I'm not sure what you're trying to add to the abstract with that phrase, but as is it mainly adds confusion for me.

I read this as distinguishing between (on the one hand) an externally defined set of parameters and (on the other hand) a locally emergent pattern that may be too complex to be readily understood but which nonetheless produces behavior that conforms to our expectations for the concept. Consider Google's surprising 2012 discovery of cats.

You can teach somebody about the moon by describing it very precisely, or you can teach them about the moon by pointing to the moon and saying 'that thing.' In the latter case, you have specified a concept without defining it.

That's a lot of meaning to be hanging on "defining" and "specifying".

Could entirely be what he meant. I guessed something similar, but I wouldn't want a reader having to guess at the meaning of an abstract.

Thanks for pointing that out, I didn't realize that the intended meaning was non-obvious! Toggle's interpretation is basically right: "rigorously defined" is referring to something like giving the system a set of necessary and sufficient criteria for when something should qualify as an instance of the concept. And "specifying" is intended to refer to something more general, such as building the system in such a way that it's capable of learning the concepts on its own, without needing an exhaustive (and impossible-to-produce) external definition of them. But now that you've pointed it out, it's quite true that the current choice of words doesn't really make that obvious: I'll clarify that for the final version of the paper.

Obtaining desired AI behavior

Looks like you're making distinctions between how you're going to build something that has the desired behavior. That how would be the specification.

These concepts could be explicitly specified set theoretically as concepts, or specified by defining boundaries in some conceptual space, or more generally, specified algorithmically as the product of information processing system with learning behavior and learning environment, without initially explicitly creating a conceptual representation.

It's not that one way is rigorous, and one is not, but that they are different ways of creating something with the desired behavior, or in your particular case, different ways between creating the concepts you want to use in producing the desired behavior. The distinction between a conceptual specification and an algorithmic specification seems meaningful and useful to me,

I think this works as a drop in replacement for the first two sentences:

Sophisticated autonomous AI may need to base its behavior on fuzzy concepts such as well-being or rights to obtain desired AI behavior. These concepts are notoriously difficult to explicitly define conceptually, but we explore implicitly defining those concepts by algorithmically generating those concepts.

I assumed that the type of AI design you're exploring is structurally committed to creating those concepts, instead of simply creating algorithms with the desired behavior, or I would have made more general statements about functionality.

Whatever you think of my proposed wording, and even if you don't like the distinctions I've made, the crucial word that I've added is but - an adversity conjuction. But, while, instead, ... a word to balance the things you're trying to make the distinction between, thereby identifying them. The meaning you intended in the first two sentences was a tension or conflict, but the grammar and sentence structure didn't reflect that.

Thanks. I ended up going with:

Sophisticated autonomous AI may need to base its behavior on fuzzy concepts such as well-being or rights. These concepts cannot be given an explicit formal definition, but obtaining desired behavior still requires a way to instill the concepts in an AI system. To solve the problem, we review evidence suggesting that the human brain generates its concepts using a relatively limited set of rules and mechanisms. This suggests that it might be feasible to build AI systems that use similar criteria for generating their own concepts, and could thus learn similar concepts as humans do. Major challenges to this approach include the embodied nature of human thought, evolutionary vestiges in cognition, the social nature of concepts, and the need to compare conceptual representations between humans and AI systems.

At least for me, this very clearly identifies the problem and your proposed approach to tackling it.

(Your first link is broken.)

I'm happy to see MIRI-associated papers being disseminated among wider academic communities!

(Fixed, thanks.)

I'll take a read! I'm excited someone is developing on this topic!

Congratulations Kaj! I didn't know Blai Bonet was involved with these things. He's a cool dude (also Judea's ex-student, but he works on planning now not causality).

Nice paper.

The last sentence of the abstract calls embodied cognition, evolutionary vestiges, and social concepts "complications". The preprint, happily, treats them as core problems. I suggest changing that word in the abstract.

Good point, thanks. I'll do that.

[-][anonymous]9y30

I could hug you. I owe you a drink. This is precisely the direction I was thinking FAI research should be heading in!

Your preprint is inaccessible and I'm on the other side of the planet, so I can't actually do any of the things listed above, but they are firmly on my TODO list.

Thanks! I'll take you up on the drink offer if I ever end up on your side of the planet. :)

If you can't access the academia.edu copy, does this link work?