"Pretty simple" algorithm playing games quite impressively.

http://www.youtube.com/watch?v=xOCurBYI_gY

First, this is awesome - enjoy!

Paper here http://www.cs.cmu.edu/~tom7/mario/mario.pdf

One interesting observation made by Tom Murphy is that the AI found and exploited playable bugs in the game not (commonly) known to human players. I think it's a good example to have available suggesting what a really smart AI might look for to win.

New to LessWrong?

New Comment
22 comments, sorted by Click to highlight new comments since: Today at 12:31 AM
[-]Larks11y170

Also, the behaviour at the very end is very much "oh we didn't expect that to happen forever when we wrote that utility function"

Wireheading :)

After taking Berkeley AI course part 1 it feels like "getting stuck forever" is one of the standard game AI bugs. Makes you wonder if some if the mental mechanisms we have evolved to handle that.

Makes you wonder if some if the mental mechanisms we have evolved to handle that.

Boredom would seem like one obvious candidate.

It was published on April 1.

But he certainly expected the "increment any counters" objective function would not map to perfect play, at least by the time he got to Tetris. It was more of a case of "well, this algorithm is probably not going to beat any of these games, but I'm not sure of all the ways that it will fail."

I don't understand why the algorithm didn't pick up on his score in Tetris monotonically increasing - unless he was just such a bad player that the number of rows also monotonically increased?

[-][anonymous]11y80

It did pick up on his score increasing. But you get a few points just for putting a block on top of another block, and the search didn't look far enough ahead (or wasn't comprehensive enough) to spot that making lines would give even more points.

I was referring to the "pause just before dying" behavior, which would have persisted even with enough search depth to make lines.

Tom's Mario page.

Tom's blog.

He promises to post more videos when he gets back from Japan.

I LOLed, for real, at the end of the video.

Also, Tom's academic website. It's the coolest academic website I've ever seen.

Wow. First thought: who is this guy who submits a really cool scientific result to a thing like SIGBOVIK? He could have sent this thing to a real conference! It's a thing no one has ever tried!

Then I checked out his website. The academic one. And the others.

Well, short description: "Superhero of Productivity". The list of stuff he created doesn't fit on his site. Sites. Also, see this remark of his,

One of the best things about grad school was that if you get your work done then you get to do other stuff too.

(I'm also at CS grad school, am happy if I have time to sleep, and my only productive output is... LW comments... does that count?)

This thing looks more and more relevant as I think about it. What it does is not just optimizing an objective function in a weird and unexpected way, but actually learning it in all its complicatedness from observed human behavior.

Would it be an overestimation to call this a FAI research paper?

AI research paper? Maybe not.

What's friendly about this AI?

The point is that it's not, but making it so is a design goal of the paper.

Example: Mario immediately jumping into a pit at level 2. According to the learned utility function of the system, it's a good idea. According to ours, it's not.

Just as with optimizing smiling faces. But while that one was purely a thought experiment, this paper presents a practical, experimentally testable benchmark for utility function learning, and, by the way, shows a not-yet-perfect but working solution for it. (After all, Mario's Flying Goomba Kick of High Munchkinry definitely satisfies our utility functions.)

[-]pjeby11y-20

What's friendly about this AI?

Nothing. It's mostly useful to illustrate cognitive biases around AI, demonstrating how alien a simple "utility"-maximizing process is, compared to how humans think about things. It's an example answer to the standard, "But my AI wouldn't do a stupid thing like that" objection. Well, yes, actually, it would. And the simpler and more elegant your design is, the higher the probability that it will do things like that: things you don't even think about because to a human, they're obviously stupid. (At the same time, of course, it will also do things that seem utterly brilliant to a human, for the exact same reason: finding that brilliant move first required doing something stupid, like jumping at an enemy.)

It also illustrates some decision theory concepts, like looking into the future to see how your actions fare, and the importance of matching the machine's "utility" with a human's utility. (In each game, the actual game utility differs in certain ways from the simple utility function derived from scoring, and it's these differences that create the bad-weird moves.)

I would like to see more discussion of this sort of "AI boxing".

Typically the implicit assumption I've seen is that "The AI is in a box where a limited communication channel enables it to learn about the real universe and have conversations with its designers", which I'm convinced is unacceptably fragile.

I think a version like "The AI is in a box where the laws of physics are little more than a video game, it doesn't get any direct information about the real world, and we just watch its behavior in the game world to make inferences about it" might be more interesting. Occam's Razor isn't designed to be proof against overwhelming deception like this, and so it might not be too hard to push any credence the AI gives to "this world is an illusion generated within a much more complex outer world" down to negligible values.

I think this is more AI kickboxing.

I thought that the most interesting observation was in the paper and not in video, namely the use of pause in Bubble Bobble. As in Tetris, it often found that if it paused, it was happier about the results than if it didn't, but each step when paused let it try again to search for good moves and since it wasn't trapped like in Tetris, it eventually found them.

I would be interested in seeing the learning process for pacman further. I guess the algorithm just ran for a couple of iterations. Also, could we run an experiment with more complicated games, like doom? There also is an obvious way to count, namely the number of killed enemies. Chess? Maybe even some poker?

I will have to read that paper.

Also, could we run an experiment with more complicated games, like doom? There also is an obvious way to count, namely the number of killed enemies.

There is an obvious way to count, yes, but can it be easily located in the raw RAM of a running Doom instance? That's the point here, automatically inferring a measure of progress from somewhere in the raw binary blob.

If you have to explicitly define a reward counter, then you're just doing normal reinforcement-learning or AI kinda stuff, and you might as well go use a non-joke agent like AIXI-MC (already plays Pacman) or something with a more straightforward and less hilariously awesomely bizarre design.

Plus from the paper, it sounds like there are serious performance and scalability issues on just the small NES games he was using, so it may not be feasible to run on as big a program as Doom without real work like using a cluster.