Programming Languages For AI

Donald Hobson

Firstly a chess analogy

Suppose that you were part of a team trying to build a chess playing program. Your team has not yet had the fundamental insight of min-max search with approximate evaluations. While you definitely want some people on the team thinking hard about the abstract nature of chess, this is a somewhat serial process, is there anything else that could usefully be done in parallel?

While we wouldn't have the insights to no exactly what a chess engine would look like, we can say some things about the code. The code will almost certainly want some sort of representation of chess pieces or chess boards. So implementing a virtual chessboard would not be a waste of time.

More speculatively, you might overhear the Thinking Hard About Chess department talk about how a good chess position was defined in terms of making the opponents move not good, and invent recursion.

Ideally, when the crucial insights have been had, all the building blocks needed to make it are ready to go. The first chess engine is half a page of chesslang code.

On to AI

So, can we think of any programmatic building blocks that are likely to be useful in building an AI. Yes, arithmatic, if statements, lists and so on.

These features have already been implemented in many programming languages, can we think of any features that might be useful in making AI that aren't easily available in any programming languages?

Much current AI is arithmetical with neural networks and propagation. However, there are already several fairly easy to use libraries for this, and I can't see how to make it substantially easier to program this sort of AI, other than stuff like hyperparameter optimization that lots of people are already working on.

Much theoretical research on AI is symbolic, even Godelian. AIXItl for example, executes every piece of code up to a certain length and runtime. That would suggest that the programming language should make it easy to generate syntactically correct code, and to run it for a bounded time without side effects. This is also what you would want if you were generating code with an evolutionary algorithm.

On the alignment forum, several designs of AI mentioned involve "search for a proof that this piece of code halts". So our programming language for AI should have powerful proof handling facilities.

Suggested Mechanics

Some inspiration for the potential programming language designer

For anyone thinking that lisp isn't abstract and self referential enough

First draft level.

Start with an "everything is a list" approach from lisp/scheme.

First order propositional logic can be defined in terms of the symbols $(,), ⟹, ⊥$ and the propositions $p_{1}, p_{2}, \dots$ . When you are trying to prove something about propositional logic, the fewer symbols to deal with the better. However, when you are trying to prove something in propositional logic, you want extra symbols like $\land$ . This can be managed by considering $(a \land b)$ as syntactic shorthand for $(a ⟹ (b ⟹ ⊥)) ⟹ ⊥$ . Lisp style macros are good for this.

We can consider programs as formulas in some first order theory.

Consider the program $(+ 5 (* 2 3))$ The interpreter can syntactically modify it into the simpler program $(+ 5 6)$ and then into $11$ .

A more complex example. $(sum (map '(1 2 3) (lambda (x) (* x x))))$ goes to $(sum '((* 1 1) (* 2 2) (* 3 3)))$ then to $(sum '(1 4 9))$ and finally to $14$ .

Considered like this, the interpreter is automatically generating a proof that the program is equal to some value. Its automatically simplifying the program until it gets its answer. Note that it doesn't harness the full power of first order arithmetic, its missing predicates like $(\forall n \in N)$ . It can also get stuck in infinite loops.

In the general view, there are a finite number of transformations that can be applied, and you want a sequence of transformations that leads from one tree to another. These transformations are the atomic tactics. (They are equivalent to axioms in some sense).

Example $(forall (x N) (< x 7))$ can be transformed by the general rule that for any $y \in N$ and function $p : N \to Bool$ , expressions of the form $(forall (x N) (p x))$ are equivalent to $(and (p y) (forall (x N) (p x)))$ . This gives (y=9) $(and (< 9 7) (forall (x N) (< x 7)))$ . This turns into $(and False (forall (x N) (< x 7)))$ and then $False$ .

Note that the only difference is that here the choice of local transformations is not uniquely determined. $forall$ makes the abstract symbol $x$ available in its interior in much the same way that lisps "let" does. Here $N$ the natural number type. This language would be typed at parse time, the type of an expression should depend only on the type of its sub-components. (A strict interpretation might have less_reals and less_ints distinct functions, one of type, but using the < symbol for both should work.) A type is just a variable of type type, so using a type that you haven't defined, and isn't builtin is just a special case of using an undefined variable. When parsing the code, use of any undefined variable fails syntactically. Types can be combined with union and struct as common in strongly typed programming languages. Standard category theory based type system.

This is where the idea of tactics comes in. (word definition) A tactic is a function that takes in an arbitrary formulae (and possibly some other stuff), and processes it and outputs a semantically identical formulae. (with runtime bounds on it doing so if need be, and the possibility of arbitrary programmer supplied hints)(it applies a series of syntactic transformations)

Here a function is any expression with free variables to substitute. $(lambda (x) (exists (y N) (= (* y 2) x)))$ is a perfectly good function, which returns $(exists (y N) (= (* y 2) 4))$ when called on $4$ .

One builtin tactic would be evaluate, which evaluates expressions and finite loops, and ignores any predicate. If you just wrapped the rest of the code in an evaluate, an never used any other tactics, you would have a side effect free version of lisp. (or something like it)

Another tactic might be called example, which removes $(\exists x)$ by having the programmer give a suitable $x$ .

A toy example would be this function, which transforms $\exists y : p (y)$ into $(p (x)) \lor (\exists y : p (y))$ . Ie $(exists (y N) (= (* y 2) 4))$ into (using x=2) $(or (= (* 2 2) 4) (exists (y N) (= (* y 2) 4)))$

$(lambda (f x) (if (= (car f) 'exists) (list 'or (cddr f) (replace f (caadr f) x)) f))$

Another could be minimize_metric_neighborhood this would take in some metric, eg number of sub-expressions, and try a bunch of other tactics to minimize it.

other tactics would be induction. deduction_thm ect.

The important point is that the programmer is free to design new tactics. The job of the compiler/interpreter is to ensure the tactics transform code in a syntactically valid manor, and to implement the built in tactics.

Note that you need to use a tactic to define new tactics.

Suppose you didn't have the deduction theorem and you wanted to define it

Suppose you provide a function that takes in a proof using the deduction theorem, and outputs a proof not using the deduction theorem, without proving that this function always outputs a valid proof. All you have is a macro, a convenient programmer shortcut. Whenever you use the deduction theorem macro, the proof is expanded out behind the scenes. Convenient, but not a new tactic.

Suppose you prove that "any theorem that can be proved using the deduction theorem can also be proved without it", not necessarily in a constructive manor. Then any time you want to use the deduction theorem, the program can use that tactic without expanding it out in this specific case.

To avoid Lobian obstacles to do with self trusting proof systems, all tactics must have a rank, which is an ordinal.

Suppose a tactic $A$ is defined as an arbitrary function from an expression $e$ and a hint $h$ to an output $s$ .

You validate $A$ by proving (using tactics $t_{1}, t_{2} \dots t_{n}$ ) that $\forall e : \exists i \in N : \exists r_{1}, r_{2}, \dots r_{i} \in Q : \exists h_{1}, h_{2}, \dots h_{i} \in hints : A (e, h) = r_{1} (r_{2} (\dots r_{i} (e, h_{i}), \dots, h_{2}), h_{1})$

(You needn't calculate $r_{i}$ or $h_{i}$ explicitly, just show that they must exist.)

Where $Q \subset tactics$ is a set such that ${sup}_{q \in Q} (rank (q) + 1) = α$ where $α$ is some ordinal. Then the rank of $A$ must be chosen such that $rank (A) \geq α$ and $\forall j \leq n : rank (A) \geq rank (t_{j})$ . Most of the time, these will be small finite ordinals that could be filled in automatically.

Note that any proof that uses tactics of rank at most $α$ is a valid proof in the language of $Z F C + α$ . The basic language would have a ZFC set type, (you need it for the ordinals), but if it wasn't there, you should be able to define it by declaring a bunch of atomic tactics ).

Questions to discuss in comments

1) Is it a good real world strategy to design new programming languages more suitable for AI?

2) Are there any features a good AI language might want other than an excessive amount of self reference and abstract mathematicallity.

3) Any more suggestions or ambiguities related to the programming language outlined above?

4) If a programing language like this already exists, let me know.

Hi, I'm a Ph. D. student at MIT in programming languages.

Your choice of names suggests that you're familiar with existing tactics languages. I don't see anything stopping you from implementing this as a library in Ltac, the tactics language associated with Coq.

I'm familiar with a lot of DSLs (here's umpteen of them: https://github.com/jeanqasaur/dsl-syllabus-fall-2016/blob/master/README.md ). I've never heard of one designed before they had an idea what the engine would be.

E.g.: you can write a language for creating variants of minimax algorithms, or a language for doing feature extraction for writing heuristic functions, but you wouldn't think to write either of those unless you knew how a chess AI worked. Without knowing that those are useful things, what's left? Abstract data types are sufficient for representing chess pieces cleanly. Maybe you'll decide to write concrete syntax for chess positions (e.g.: write a chess move as Nx6 and have the compiler parse it properly), but, putting aside how superficial that would be, you would do that in a language with extensible syntax (e.g.: Wyvern, Coq, Common Lisp), not a special "chess language."

The recent-ish development of probabilistic programming (hey, now there's a family of AI languages) is instructive: first was decades of people developing probabilistic models and inference/sampling algorithms, then came the idea to create a language for probabilistic models backed by an inference engine.

Lisp used to be a very popular language for AI programming. Not because it had features that were specific to AI, but because it was general. Lisp was based on more abstract abstractions, making it easy to choose whichever special cases were most useful to you. Lisp is also more mathematical than most programming languages.

A programming language that lets you define your own functions is more powerful than one that just gives you a fixed list of predefined functions. In a world where no programming language let you define your own functions, and a special purpose chess language has predefined chess functions. Trying to predefine AI related functions to make an "AI programming language" would be hard because you wouldn't know what to write. Noticing that on many new kinds of software project, being able to define your own functions might be useful, I would consider useful.

The goal isn't a language specialized to AI, its one that can easily be specialized in that direction. A language closer to "executable mathematics".

I've thought a fair bit about PLs for AI, mostly when I get pissed off by how bad current languages are for certain AI-related things.

My biggest complaint is: most languages make it a huge pain to write code which reasons about code written in the same language. For example: try writing a python function which takes in another python function, and tries to estimate its asymptotic runtime by analyzing the code. It doesn't need to always work, since the problem is undecidable in general, but it would be nice if it could handle one specific thing, like maybe estimating runtime of a dynamic programming algorithm. Problem is, even if you have a nice algorithm for calculating asymptotic runtime of DP, it will be a huge pain to implement it - at best you'll be working with an abstract syntax tree of the python function.

LISP makes this a bit more pleasant, since the code itself is already a data structure - the abstract syntax tree is the code. But abstract syntax trees still just aren't that great a representation for reasoning about code. We use trees, and write tree-shaped code, because trees can be represented neatly in linear text files (by nesting lots of parens () or similar delimiters {}). But the semantics of code, or the execution of code, is usually not tree-shaped. What we'd really like is a better data structure for representing programs, something which is inherently closer to the semantics rather than the syntax. Then, with that data structure in hand, we could work backwards to find a good language to represent it.

I also have some opinions about what that data structure should be, but at this point I think posing the question is more useful than arguing about a solution. If you're thinking about proofs and tactics, then I'd recommend thinking about a representation of tactics which makes it elegant and easy for them to operate on other tactics.

Disclaimer: I don't know anything about designing programming languages.

I don't think this programming language can neatly precede the AI design, like your example of chesslang. In fact, it might be interesting to look at history to see which tended to come first - support for features in programming languages, or applications that implemented those features in a more roundabout way.

Like the proof reasoning support, for example, might or might not be in any particular AI design.

Another feature is support for reasoning about probability distributions, which shows up in probabilistic programming languages. Maybe your AI is a big neural net and doesn't need to use your language to do its probabilistic reasoning.

Or maybe it's some trained model family that's not quite neural nets - in which case it's probably still in python, using some packages designed to make it easy to write these models.

Basically, I think there are a lot of possible ways to end up with AI, and if you try to shove all of them into one programming language, it'll end up bloated - your choice of what to do well has to be guided by what the AI design will need. Maybe just making good python packages is the answer.

I agree that if the AI is just big neural nets, python (or several other languages) are fine.

This language is designed for writing AI's that search for proofs about their own behavior, or about the behavior of arbitrary pieces of code.

This is something that you "can" do in any programming language, but this one is designed to make it easy.

We don't know for sure what AI's will look like, but we can guess enough to make a language that might well be useful.