Learn (and Maybe Get a Credential in) Data Science

Jayson_Virissimo

Coursera is now offering a sequence of online courses on data science. They include:

Upon completion of this course you will be able to identify and classify data science problems. You will also have created your Github account, created your first repository, and pushed your first markdown file to your account.

2. R Programming

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.

3. Getting and Cleaning Data

Upon completion of this course you will be able to obtain data from a variety of sources. You will know the principles of tidy data and data sharing. Finally, you will understand and be able to apply the basic tools for data cleaning and manipulation.

4. Exploratory Data Analysis

After successfully completing this course you will be able to make visual representations of data using the base, lattice, and ggplot2 plotting systems in R, apply basic principles of data graphics to create rich analytic graphics from different types of datasets, construct exploratory summaries of data in support of a specific question, and create visualizations of multidimensional data using exploratory multivariate statistical techniques.

5. Reproducible Research

In this course you will learn to write a document using R markdown, integrate live R code into a literate statistical program, compile R markdown documents using knitr and related tools, and organize a data analysis so that it is reproducible and accessible to others.

6. Statistical Inference

In this class students will learn the fundamentals of statistical inference. Students will receive a broad overview of the goals, assumptions and modes of performing statistical inference. Students will be able to perform inferential tasks in highly targeted settings and will be able to use the skills developed as a roadmap for more complex inferential challenges.

7. Regression Models

In this course students will learn how to fit regression models, how to interpret coefficients, how to investigate residuals and variability. Students will further learn special cases of regression models including use of dummy variables and multivariable adjustment. Extensions to generalized linear models, especially considering Poisson and logistic regression will be reviewed.

8. Practical Machine Learning

Upon completion of this course you will understand the components of a machine learning algorithm. You will also know how to apply multiple basic machine learning tools. You will also learn to apply these tools to build and evaluate predictors on real data.

9. Developing Data Products

Students will learn how communicate using statistics and statistical products. Emphasis will be paid to communicating uncertainty in statistical results. Students will learn how to create simple Shiny web applications and R packages for their data products.

You can take the entire sequence for free or pay $49 for each course in order to (upon completion) receive a Specialization Certificate from Johns Hopkins University.

The very popular blog Simply Statistics discusses the program here.

I enrolled in the "Data Analysis" Coursera class this fall, on recommendations from online acquaintances, to see what Coursera was like and to start getting the hang of R.

As far as I can tell, it's closely similar to module 4 here. (One notable difference is that it relied on plot exclusively rather than ggplot2, for some unfathomable reason.)

The experience was a mixed bag. I did learn a bunch of things I didn't know; however only a small fraction of the course covered theoretical concepts, and as the course progressed it focused more and more on memorizing R commands without any deep understanding of the math they are based on.

I'm not at all convinced that the video format is more effective than a book with the same information. I watched videos at 1.5x speedup, sometimes read the transcripts and skipped segments of video entirely. What value I found there was in doing the exercises, and there were relatively few of those. A couple of the "correct" answers were in fact incorrect, generating controversy in the student forums, but the instructors/administrators didn't seem to care much or even notice.

MOOCs are sometimes advertised as "self-paced" learning; in the case of Coursera, this is a lie. Because I enrolled late, I wasn't able to get official credit for any of the exercise sets I completed (I also skipped the problems entirely). On the one hand it's nice that you can enroll out of phase, but on the other hand it's sort of silly that an online course is still tied to an artificial schedule. For instance, if you are going faster than the class, you will still have to wait for each segment to be opened on a weekly basis.

Finally, I've just discovered that because the class I took is now closed, I am no longer allowed access to any of the videos, quizzes and assignments - nor to my own results. I don't think I need to harp on how broken this is.

When I first saw this and realized that it was 9 courses long plus a capstone project, I was really excited! Then looking a bit more deeply, I see that they are 3-4 hours/week each and each is 4 weeks long. Though they recommend taking them in order, if you look deeper, it's pretty clear that you can take the first 3 in one batch, the second 3 in another, and the third 3 in another batch. That's 12 weeks at 9-12 hours/week... or, basically, the equivalent of one university class... for $490, which is about the price of paying per credit-hour for a 4 unit course through an average extension program at a public university. It will probably be great quality, but don't expect to have "credential-level" knowledge after taking it.

Cool.

For those who don't want to wait until April 7th, Udacity is scheduled to launch their own Intro to Data Science on Feb 5th (this Wednesday). I expect it to be easier/shallower than the Hopkins/Coursera sequence, but it wins on actionability.

Students will learn how communicate

Didn't spot that on the first read; a member of a rare species: the self-refuting prophecy.

If only they used Python.

You pay $490 and get what exactly? Afaikt the business model is, pay $490 to get exactly the same level of treatment as someone who does not pay a cent but with a piece of paper you can print out confirming you are not a liar?... If it offered credit, that would be one thing, but the fact that is doesn't is quite telling .(I guess it is impressive that coursera has found a way to print out "certificates" with the John Hopkins logo on even though no-one from John Hopkins will ever see or interact with any of the students though) You don't get any feedback or assessment, you just get some added signalling value. Still - the courses look interesting if pretty basic.

You don't get any feedback or assessment, you just get some added signalling value.

Yes, you pay money for signalling. What's wrong with that?

Even with my current studies, if I need help with a specific issue I don't walk to the TA but put the problem up on stackoverflow or another stackexchange website. I don't need university staff to learn something but I need the university for signaling.

Including signaling "thanks" to the university. :-)