jkrause — LessWrong

"Human-level control through deep reinforcement learning" - computer learns 49 different games

That was indeed one of the hypotheses about why it was difficult to train the networks - the vanishing gradient problem. In retrospect, one of the main reasons why this happened was the use of saturating nonlinearities in the network -- nonlinearities like the logistic function or tanh which asymptote at 1. Because they asymptote, their derivatives always end up being really small, and the deeper your network the more this effect compounds. The first large-scale network that fixed this was by Krizhevsky et al., which used a Rectified Linear Unit (ReLU) for... (read more)

"Human-level control through deep reinforcement learning" - computer learns 49 different games

jkrause11y140

Can confirm that hardware (and data!) are the two main culprits here. The actual learning algorithms haven't changed much since the mid 1980s, but computers have gotten many times faster, GPUs are 30-100x faster still, and the amount of data has similarly increased by several orders of magnitude.

"Human-level control through deep reinforcement learning" - computer learns 49 different games

jkrause11y90

My layperson's understanding is that this is the first time human accuracy has been exceeded on the Imagenet benchmarking challenge, and represents an advance on Chinese giant Baidu's progress reported last month, which I understood to be significant in its own right. http://arxiv.org/abs/1501.02876

One thing to note about the number for human accuracy for ImageNet that's been going around a lot recently is that it was really a relatively informal experiment done by a couple of members of the Stanford vision lab (see section 6.4 of the paper for details)... (read more)

"Human-level control through deep reinforcement learning" - computer learns 49 different games

jkrause11y80

Training networks layer by layer was the trend from the mid to late 2000s up until early 2012, but that changed in mid 2012 when Alex Krizhevsky and Geoff Hinton finally got neural nets to work for large-scale tasks in computer vision. They simply trained the whole network jointly with stochastic gradient descent, which has remained the case for most neural nets in vision since then.

Open thread, 18-24 March 2014

jkrause12y20

Yes, this happens to me in Windows, but not Ubuntu (both Chrome).

Open Thread for February 11 - 17

jkrause12y130

Here's one interesting way of viewing it that I once read:

Suppose that the option you chose, rather than being a single trial, were actually 1,000 trials. Then, risk averse or not, Option 5 is clearly the best approach. The only difficulty, then, is that we're considering a single trial in isolation. However, when you consider all such risks you might encounter in a long period of time (e.g. your life), then the situation becomes much closer to the 1,000 trial case, and so you should always take the highest expected value option (unless the amounts involved are absolutely huge, as others have pointed out).