Yesterday I ran into an “interesting” problem.

The problem is, that sometimes when I try to train my network to pick hyperparameters, it just completely fails to learn.

This is a pretty frustrating problem!

I don’t really know what could be causing it. I could come up with guesses—too many negative initial weights that ReLUs prevent me from training out of, for example—but these don’t really give me useful predictions or interventions so I don’t want to pretend like I understand it.

What concerns me about this is that it may result in a bad choice of hyperparameters for my model going forward.

I’d really like to get reasonably answers out of my model as I zoom in to more specific values but if my model is going to behave in completely wacky ways when I start a new loop to do the exact same thing, it really brings down my level of trust.

In theory I could run several trials, and set random seeds, and check my weight initializations by digging into the keras documentation, but while some of those things might help none of them address the more philosophical problem of why I should expect sometimes that I’ll initialize my model and it will just consistently fail to learn anything for a dozen sets of parameters in a row, and then when I reset the model ONCE (which I thought I was already doing between every choice of parameters) it starts working for every set of parameters—those same sets of parameters that weren’t working before.

Anyway WTF.

Having thought about other things for five minutes, I’m realizing one of the most mysterious things to me in my code base is, somewhat ironically, the generator function I wrote to pull mini-batches of data from the dataset on hard disk. The function opens the file and runs through it and loops around, and it could be that (despite my shuffling the data) there’s a section with a bunch of really difficult examples and when it gets to that section and that’s what stops the algorithm from learning. Then it will just work its way out of this section on its own, and I don’t have the diagnostic tools available to figure out what to do about it.

That’s sort of depressing but I think it offers its own somewhat silly and obvious lesson:

Don’t assume someone else’s code is giving you problems.

So I worked things out enough to have a model that was definitely learning, and I left it running overnight so that it could train a reasonable amount and I’d come in in the morning, 100 epochs later (or less if early stopping kicked in) with a model that got 95%+ accuracy and I could work from there.

Unfortunately my computer’s sleep mode stopped it about three quarters of the way through the first epoch. It did get up to almost 86% train accuracy, so I’m not complaining too much (and it does take almost half an hour per epoch so it wasn’t a total waste of time.

But I’m left with two problems that I wish I’d seen resolutions to in the first courses I took on deep learning:

  1. How can I run a large amount of training so that if I need to interrupt that training my progress will be saved?

  2. What sort of setup should I do with my OS so that it doesn’t fall asleep in the middle of training a model?

I wouldn’t be surprised if both of these had a simple answer (I bet Keras has the ability to stop gracefully after a minibatch, for example, and I bet python has a module that allows me to set a process priority so that the computer thinks what I’m doing is important and doesn’t fall asleep, which I know it can do because it does that when I watch videos). But neither of them really occurred to me in those first courses and so they went unanswered.

So I guess I’ll add a second trite little takeaway:

When teaching a subject, defining the scope and the environment early on is just as important as the subject itself.

New to LessWrong?

New Comment