DeepMind has published a new paper in Nature detailing "Alpha Go Zero", a Go AI that was trained with only self-play. Alpha Go Zero equalled the ability of a system trained with supervised learning on a Go profession game corpus in 24 hours (using the same computing resources) and exceeded the ability of the version of Alpha Go that defeated Lee Sedol in 48 hours.

This surprised me. I remember Demis Hassabis saying during the Lee Sedol game coverage that he'd like to look at learning Go from scratch, using only self play and no supervised learning from human games. I thought that that sounded much harder, and guessed that if supervised training times were on the order of a month then from scratch it would take anywhere from three months to years to recapitulate what was learned from human games.

Other things I noted from the paper were that Alpha Go Master (which played and won 50 games against professional players in 2016-2017) used a different architecture to the previous versions, and is about 12 times more computationally efficient as well as being a significantly better player. Zero is based on that architecture (I think), and took about 35 days of training (i.e. self-play) to equal, and then exceed Master's ability. I don't know how long Master took to train, but going by the results of this paper I'm guessing that the supervised training would have provided maybe only a day or so worth of head-start compared to zero.

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 3:15 PM

To quote Eliezer,

"So let’s say you have an Artificial Intelligence that thinks enormously faster than a human. How does that affect our world? Well, hypothetically, the AI solves the protein folding problem. And then emails a DNA string to an online service that sequences the DNA, synthesizes the protein, and fedexes the protein back. The proteins self-assemble into a biological machine that builds a machine that builds a machine and then a few days later the AI has full-blown molecular nanotechnology".

Anyone else think Hassabis has an ulterior motive here?

Presumably finding profitable new technology is a sufficient motive.

The announcement that Deep Mind goes to protein folding is intersting. Solving protein folding would really matter.

Giving that we can print DNA, solving protein folding means that we can design new proteins to do what we want and have powerful nanotech.

We might also get algorithms that can predict binding between different proteins.

Yes, altho it is of course possible that the protein folding search space has a low maximal speedup from software, and could turn out to be hardware bottlenecked.

I don't see a reason why this should be the case. It might be that deep learning is not helpful for the problem but the idea that software can't get better feels very unlikely.

When reading about Alpha Go's I'm really curious about how strong it is. I would like the team to change the scoring function in way that makes it value winning with more points a bit more. Afterwards it could play online and give handicap until the winning percentage becomes 50-50.