Beginning Machine Learning

by crybx6 min read30th Apr 20184 comments

12

Machine LearningPostmortems & Retrospectives
Personal Blog

I recently finished the Machine Learning course on Coursera that is recommended by MIRI's research guide for developing a practical familiarity with machine learning.

This post contains my thoughts about the course and tries to convey the updates my mental models went through as the eleven week course progressed.

I started the course with the perspective of an experienced, employed software developer with an engineering degree who had never focused on machine learning before, so I'm sure there's some background knowledge I take for granted, as well as some things I should have realized long before taking this course.

What is machine learning?

A definition introduced early in the lectures defines machine learning as the field of study that gives computers the ability to learn without being explicitly programmed.

I knew that much before this class.

What I didn't know were the nuts, bolts, and gears of how to start writing actual code that uses machine learning algorithms.

Prior to the course, I briefly tried to imagine how an application of machine learning might work (e.g. a program learning to play a game). What I came up with was a complex set of rules and loops where other complex rules somehow tweaked the original rules based on whatever results were used to represent desired outcomes.

What those rules should even be was vague, and I thought the whole system I imagined would be a fragile, error prone, maintenance nightmare.

That didn't square with the robust uses of machine learning by tons of profitable companies, or with the fact that I knew neural nets are increasingly popular and useful. I had a rough conceptual sketch of neural nets being nodes connected in layers with weights and outputs but not how to represent that in code.

I was eager for this class to tell me what I was missing.

Machine learning is math.

March 6, 2018 - As of today, I'm in the middle of week 5, and my initial reaction to the course has been: I'm basically writing programs to do statistics.

Before this course, calculating a linear regression or using the normal equation wouldn't have struck me as tasks of "machine learning." I may have thought of these as tools data scientists use, but they seemed too basic for the hype around machine learning.

In reality, the lines between artificial intelligence, data science, and machine learning are blurry and ill defined. I think in many common uses, the phrases are interchangeable.

I should have expected some degree of buzzword bingo to be going on. This is technology after all.

I will say that a linear regression is entry level stuff, and machine learning gets way more complex, but everything still boils down to math! Even neural networks amount to feeding numerical data through a series of equations.

I imagine this is true even for AlphaZero.

I'm hammering this obvious-in-hindsight point home because this is the piece I was seriously missing prior to this class. It feels embarrassing to have gone through so much higher level math for my degree and never needed any of it for my programming jobs. I'm not sure I ever gave much thought to what anyone expected we would do with the math they taught us in school.

And I'm very chagrined I didn't think of using probabilities in my pre-course wild guessing at a possible machine learning system.

A light bulb has come on for me that makes machine learning less mysterious. It's math!

Okay machine learning requires more than just math.

There are many subskills and lots of knowledge that comes from experience that would make a person a better developer of machine learning algorithms.

If you don't have good data, you've nothing to use your fancy math equations on. (Collecting and cleaning data sounds like an art.)

There's also deciding which algorithms to use for which tasks.

There's knowing how to verify your algorithms are working correctly or telling you what you think they're telling you.

There's knowing how to string together a pipeline of many single machine learning tasks that together accomplish more elaborate goals.

There's knowing pitfalls and mistakes that are common when implementing machine learning algorithms.

There's knowing techniques for dealing with overfitting or underfitting data.

Understanding computational complexity helps.

And there's even knowing how to program in the first place.

Reality is Data

While doing this course, I also happened to read the book Algorithms to Live By: The Computer Science of Human Decisions.

It's a clever book that shows many ways that computer algorithms can provide optimal solutions to everyday questions and situations, without ever getting a computer involved.

Algorithms in computer science do things with data. This book gave me the aha moment of understanding that reality IS data, and that's why algorithms are applicable outside of the domain of computer science.

Meanwhile, the machine learning course reinforced for me that machine learning is how computer science has learned to make predictions using data.

It's handy to wrap the making of predictions up in programs in order to extend our prediction making to data we haven't already calculated everything for. Or data where we're not sure what matters. Or making all the efforts we've gone through fast and reusable.

But when we do this, we are using hardware to facilitate a process we could, in theory, do without computers. Because it's real. These concepts don't only exist inside computers, they're levers for the real world.

Machine learning is how computer science makes predictions using data. And reality is made up of data.

Feeling Potential on a Gut Level

I've written so many programs, but they were always driven by the steps a programmer could enumerate and understand. The clients and servers I could imagine talking to each other. The threads I could conceive of coordinating.

I thought I understood the potential of machine learning before, but it feels like it's clicked on another level.

Drop some data into a neural network (it's math!) and transformations happen that I can't follow.

Machine learning algorithms aren't anything truly intelligent yet, but man, the potential is huge.

Software is eating the world and machine learning is following on its heels.

Course Difficulty

Each of the eleven weeks of the course include a couple hours of lectures, a couple quizzes, and a programming assignment estimated to take up to three hours to complete. The last two weeks don't have programming assignments, which I think is intentional to give people playing catch-up a chance of passing the course before the session ends.

The vast majority of my effort on the programming assignments was dominated by trying not to mess up vector/matrix math.

After the first assignment where it became clear that well vectorized code runs much faster than using loops to add and multiply, I went straight for the vectorized implementations when doing assignments from then on.

The programming assignments all involved implementing math equations in reusable functions. (Machine learning algorithms are math! (Sorry, but I'm still giddy about this.))

The actual equations are provided in the lectures and assignment instructions, which made me feel vaguely like I was cheating.

It would certainly be too much to ask students just getting familiar with machine learning to come up with the equations for algorithms themselves, but I didn't expect this class to be as easy for me as it turned out to be. Maybe it would have been harder if I wasn't a programmer. I wouldn't want to take on the course if I'd never coded.

As far as math requirements go, linear algebra and calculus are involved, but understanding anything more complicated than matrix operations isn't necessary to complete the course. (And the professor includes an optional review section on linear algebra with explanations of the necessary matrix operations.)

The professor says things like, "This value requires calculating the partial derivative," but then also gives you what the partial derivative works out to be.

I also felt vaguely guilty that I didn't remember enough to do the derivatives myself, but reviewing the calculus I haven't touched since college is already part of my AI background learning plans, so I took the professor's answers and moved on.

Even though the course was easier than I expected, I also expected to move through the material faster than I did. It wasn't easy to get myself to sit down and listen to the lectures as promptly as I would have preferred. I finished each week on time, but part of me actually expected to finish the entire course several weeks ahead of schedule.

In the end, I finished all quizzes and programming assignments a week early and ended with a grade of 100%. Coursera will bug you 50+ times to purchase the certificate for the class, but it is completely unnecessary to do so to access all of the materials or even receive a grade.

Course Age

This course was originally created in 2011, so it may be a bit dated now. I still think it holds up well and is a worthwhile intro to machine learning.

I did all the programming assignments using the latest version of Octave. The course does not remotely attempt to introduce you to all of the machine learning frameworks or development environments out there. It is focused on foundational concepts and explaining some of the most common machine learning algorithms..

This course isn't going to get one up to the bleeding edge of machine learning, but I doubt any eleven week introductory course could do such a thing anyway.

The professor does say things along the lines of, "Now you know more than many of the so-called experts making big money on this stuff in Silicon Valley."

However, the age of the course is why I can't accept this claim at face value.

Imagining a Machine Learning Cookbook

March 7, 2018 - As I think about what I want to remember from this machine learning course, I don't feel scared about forgetting the concepts and intuitions. I feel scared about forgetting the equations that make up the different machine learning algorithms.

I found myself briefly tempted to create a cheat sheet of machine learning equations, with minimal other explanation.

A follow up thought to that was: does a corpus of machine learning algorithms (with equations to implement them) already exist? I haven't been able to find such a thing. My impression is that this information is scattered throughout many papers, textbooks, and websites.

I have a Machine Learning textbook I haven't cracked open yet, but I doubt it contains the cookbook style approach I'm imagining.

I don't see a reason why, in principle, cookbook style development wouldn't be good deliberate practice for building the skill of creating code implementations of AI papers.

Maybe AlphaZero would be an advanced recipe.

How much further study should I do?

The course is a good start for the topic of machine learning, but one could always do more. My main goal is to flesh out the background knowledge I need to understand leading developments in the field of Artificial Intelligence.

I could work through a textbook next.

I could spend time coming up with my own applications, practicing without the training wheels of the course directing my coding efforts.

I could try to shoehorn applications of machine learning into my day job.

I could try assembling a cookbook like the one I imagined.

I could jump into trying to understand and implement code from machine learning papers.

Or perhaps I should set machine learning aside and focus on other topics, because this course was enough of an overview to move on.

12

4 comments, sorted by Highlighting new comments since Today at 8:06 AM
New Comment

Great overview!

I can give a few words of advice on where to continue from here, if you're interested. My own background is as a software dev for many years (13 years professionally plus a few years as a kid). I'd bene involved in many different fields, from embedded systems to web development, and recently ran a team of algorithms researchers in 3d printing, so was mildly exposed to computer vision and 3d concepts, but had no serious machine learning. Then a few years ago, I started to get much more seriously interested in ML/DL/Data Science, and have seen been working in the field (running a dev shop).

So, my take: first of all, I personally didn't much enjoy Andrew Ng's course, both because it was much too theoretical for my taste, and (in retrospect) because I didn't remember enough maths from my CS degree to work with the concepts as easily as I should have.

I'd recommend a few things for you as next steps:

1. Coursera teaches more "classical ML" (not deep learning), and without many applications. The absolute BEST followup in my mind is the Fast.ai course (free).

It focuses on Deep Learning, and teaches with a completely practical-minded approach, rather than theoretical. The idea is to, within one lesson, actually *write software*, like a simple program to tell whether a picture is of a cat or a dog. In this course, you will literally be coding practically world-class Deep Learning code within a few hours.

They're supposedly working on a more classical-ML course, but unfortunately it isn't out yet.

Seriously, this is my #1 recommendation for anyone trying to learn machine learning, especially with a background in software development. You won't be sorry.

2. If you're at all interested in actually implementing ML as opposed to more reviewing the concepts, then you should try a Kaggle competition or two. If you don't know it, it's basically a site that allows companies to upload data, then pay prize money to people writing an algorithm that does something specific with the data. E.g. predict how many page views a certain subset of pages on Wikipedia will receive.

Kaggle does a lot of good things for learning ML: It abstracts away all the data-gathering and a lot of the data-cleaning work, which is the heart of a lot of data science/ML jobs, but is not what you want to actually practice, especially if you are a developer and already know how to deal with this. It also gives specific questions and answers that need to be answered, and has a large collection of existing answers.

In short, Kaggle is the place to go practice your ML skills.

3. Learn more math, especially if you enjoy it! I've personally been self-learning the equivalent of a math undergrad, partially for my work in ML, partially for fun.

Specifically, as you correctly understood, ML is mostly statistics, calculus, and linear algebra. Based on my own background, I can tell you that my calculus was perfectly adequate for ML. However, statistics and linear algebra I had studied in much less depth, and they're both incredibly important, and fascinating subjects. Linear algebra, especially, is amazing, and depending on where you study it, you actually learn a good amount of practical implementations, including linear regressino and other ML algorithms. And not only do you learn these applications, you understand them from an entirely new perspective.

For studying linear algebra, I *highly* recommend Gilber Strang's video lectures on the subject. He is an excellent teacher, not only engaging, but very practical-minded. You should also follow along with his Linear Algebra textbook, and I highly recommend doing the videos and textbook at the same time - the textbook is not a great resource without the videos, IMO.

The major "problem" with Strang's videos are that he *really* focuses on the practical, engineering, matrix approach of teaching Linear Algebra, to the almost-complete neglect of the more mathematical approach. E.g. he teaches like half the course before he mentions linear transformations, which is *incredible*.

I still think these videos are the best approach for a software developer looking to *use* linear algebra, but I highly recommend following this up with a more mathematically-oriented textbook. Like many others, I like Axler's "Linear Algebra Done Right", which ironically takes the exact opposite approach - it takes him 3 chapters to get to explaining what a Matrix is :)

Other than that, I have recommendations for statistics (both Harvard and MIT have great courses on probability), and there's lots of other great math to study, though not all of it is directly relevant to ML.

OK, hope this post was worthwhile for you/someone. Feel free to ask if you have any questions :)

Your comment was definitely worthwhile for me. Thanks to your very strong recommendation (and the fact that it doesn't look like it'll take much time), I'm going to check out the fast.ai course very soon. I'll be referencing back to this comment to check out your other recommendations in the future too. Thank you.

This book gave me the aha moment of understanding that reality IS data, and that's why algorithms are applicable outside of the domain of computer science.

Great framing. I had read that book, but I hadn't made that connection (even though I already thought of reality as being pure information - whatever that means).

I feel scared about forgetting the equations that make up the different machine learning algorithms.

In my opinion, this isn't important - if you can grok the concepts on a gears level, you're pretty close to having the equations anyways. In real life, no one is stopping you from just refreshing yourself on the equations.

I haven't taken this course in particular (and Understanding Machine Learning will be among the next three books I review), but I imagine UML would be a good follow-up to firm up the theoretical side. Also, it may be useful to understand how more recent developments work - for example, representing high-dimensional data in low-dimensional latent spaces. One of the most fashionable ways to do this right now is via autoencoders, but you could also get the same effect in other ways.

Edit: apparently downvoting your own comment by mistake doesn't let you get the full points back after you reupvote, and only lets you get to neutral.

It's always a bit amazing to me how much I don't have to remember to be able to work on big software projects. It's like as long as I know what's possible, and when it's applicable, it takes only moments to search for and zero in on specific implementation details.

And yet in this situation, some anxious voice in my head cries, "But do you really know what you're doing if you can't remember every detail?!"

So thank you for reassurance on that. Also, thank you for the recommendations!