I spent most of the day yesterday building and debugging my minibatch generator for notMNIST. It was a day well spent, in that I expect it to make future such tasks much easier since I’m more familiar with the outline, and now I can run experiments on the notMNIST dataset.

But that said, I have a lot of uncertainty about this process.

For example, training for a couple of epochs (once I got it working so that I was reducing loss AT ALL) got me up to about 60% accuracy. I’m continuing training and seeing some decent results but every epoch takes 10 minutes or so and it failed to keep running overnight when my computer went to sleep. I don’t expect to hit the limit of returns on retraining until at least a dozen epochs, which makes this whole process a pain in the ass.

And while I literally just took three courses on what to do when you’re first training a deep learning model, I actually DON’T have a clear answer as to what the best set of first actions here are.

I do have a space of actions to sort through, so I guess I’ll write this up and maybe today I can experiment with the results of different courses of action, and maybe some day someone will be in the same situation and be able to work more efficiently. Or maybe you’ll all stop reading my posts because of the decline in quality. But writing has been helping me stay working on my end so I’m going to the end of the Umeshism.

Some possible ways to improve performance on a model that is progressing, but slowly :

  • Train the model

    • Easy, fire and forget, high chance of success

    • Takes a long time (10 min per update), hard to tell how long it will keep working

  • Change learning parameters

    • Can often see differences within first several mini-batches, somewhat amenable to grid search

    • Unclear how short term advantages in learning will translate into final model performance, may result in worse performance or breaking the model completely

  • Add model complexity

    • Can increase overall cap on performance even if other methods are unable to do so

    • Requires a lot of smaller decisions, likely to make initial learning slower, may not pay off in terms of performance until later in the process

Once I’ve written this out, it becomes pretty clear to me that training my model more instead of tuning the hyperparameters is probably just a way for me to feel like I’m doing something while being lazy.

Rationality success, I guess!

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 5:26 PM

As a sidenote: I am really enjoying these posts, but usually don't have anything particularly interesting to add. I don't think it would make sense to promote these to the frontpage atm, but I would still like to see more of these.

Thanks! Yeah a lot of the "content" that I have right now is on the order of "I spent all day writing a function that could have been a single line library call :(" because it makes me keep working the next day even if I have to spend all day on another function that could have been a single line library call. Hopefully I'll "get past" some of that at some point and then be able to conduct some experiments that are interesting in and of themselves and/or provide some notebooks alongside things, which could move in the direction of front page stuff instead of personal life blogging.