An update on Signal Data Science (an intensive data science training program)

by JonahS1 min read9th Apr 201633 comments

9

Personal Blog

In December 2015, Robert Cordwell and I cofounded Signal Data Science (website), which we announced on Less Wrong.

Our first cohort has just concluded, and overall went very well. We're planning another one in Berkeley from May 2nd – June 24th. The program is a good fit for people who are both excited to learn how to extract insights from data sets and looking to prepare for industry data science jobs. If you're interested attending the next cohort, we would love to hear from you. You can apply here, or contact us at signaldatascience@gmail.com.   

We offer inquiry-based learning and an unusually intellectually curious peer group. Unlike typical college classes, Signal Data Science focuses on learning by doing. You’ll learn from a combination of lectures, short knowledge-reinforcement problems, and longer, more open-ended assignments focusing on analyzing real datasets. (That’s your chance to discover something new!) Don’t worry if that sounds daunting: our instructors will be there to support you every step of the way.

You’ll learn both the theory and the application of a wide array of data science techniques. We offer a pair programming-focused curriculum, allowing students to learn from each other’s strengths. We cover everything from basic linear regression to advanced, industry-relevant methods like support vector machines and dimensionality reduction. You’ll do an advanced, self-directed project at the end of the course. Curious? Check out our showcase of past students’ final projects. Whatever your interests are—from doing something with real-world, industry-relevant applicability to applying cutting-edge neural nets—we’ll work with you to find a project to match your interests and help you showcase it to prospective employers.

Less Wrong readers might be especially interested by Olivia Schaefer's project, which describes results of doing some natural language processing on the Less Wrong comment corpus, explaining how the words pictured in different colors below are at opposite ends of an axis.

 

9

33 comments, sorted by Highlighting new comments since Today at 3:58 AM
New Comment

Avoid this program.

Jonah and Robert have good intentions, and I was actually happy with the weekly interview sessions taught by Robert. However, I had a poor experience with this program overall. I'll list some observations from my experience as a member of the first cohort below.

First, this program is effectively self-directed; most of the time, neither the TA nor the instructor were available. When they were, asking them questions was incredibly difficult due to their lack of familiarity with the material they were supposed to be teaching. To be sure, both the instructor and the TA were intelligent people--the problem was just that they knew lots of math, but not very much data science.

Second, there were lots of communication issues between the instructors and the students. I really do not want to give specific examples, since I don't want to say something that would reflect so poorly on the LessWrong community. However, I assure you that this was an incredibly large issue.

Lastly, everything about this program was disorganized. Several of us paid for housing through the program, which ended up not being available as soon as we'd been told that it would be. The furniture in the office space we used was set up by participants because Signal was too disorganized to have it set up before we were supposed to start using it. The fact that only two out of twelve students pair programmed together on an average day was also due to a lack of organization of the part of the instructors.

Jonah and Robert clearly worked very hard to make this program what it was, but attending was still a bad experience for me. If you already have a background in software engineering and want to pay $8,000 to teach yourself data science alongside other students who are doing the same, this program is a good fit for you. Otherwise, consider attending a longer, more established program, like Zipfian Academy that actually uses pair programming and has instructors available to answer questions.

[-][anonymous]5y 9

I'm sorry that you had such a negative experience at the bootcamp. It isn't for everyone, and I don't think I would recommend Signal to people who are looking for what you wanted out of the bootcamp. I wish that it had been otherwise; nevertheless, I want to thank you for sharing your thoughts in such an honest and frank manner.

However, I think it's important to separate out your own experience from the experiences of other students. In many cases, including my own, they were radically different.

I'm not personally comfortable with your comment insofar as it seems to implicitly speak for all the students in the bootcamp. I know that my life improved greatly because I was able to come down here, but if I were a prospective student now, your comment might have dissuaded me from coming. For that reason, I believe it's useful to be more specific in your epistemic claims here--it may very well be true that the program is unsuitable for people in your reference class, but I think it would be bad if that fact ended up discouraging applicants for whom the program would be a great fit.

I'm surprised that you think the instructors don't know very much data science. On top of having a strong command of the underlying mathematics, Jonah and Sam were able to teach me things that aren't explained in textbooks, like the intuitive explanation of why the sum of squared error is minimized in linear regression and the fundamental importance of dimensionality reduction techniques. The numerous discussions I've had with Jonah have shaped my intellectual growth generally and made clear to me many of the more obscure aspects of data science specifically--for instance, I had been reading a couple papers on boosting out of personal interest and offhandedly made a remark to Jonah about something I found fascinating, and he was able to immediately understand and rectify a minor point of confusion I had been having.

Again, your perception of the instructors' competencies may have been the result of a mismatch between the sort of environment the program was trying to offer and the sort of environment you were hoping for. I wish that your experience could have been as positive as mine and hope you're able to find what you're looking for in the future. Based on your feedback, Signal is giving higher priority to giving prospective students a clear sense for the program's environment so that they're are well equipped to make informed decisions.

Again, your perception of the instructors' competencies may have been the result of a mismatch between the sort of environment the program was trying to offer and the sort of environment you were hoping for.

This actually sounds about right.

I think that I care more about job-preparedness, potential for impact, and preparing people for being able to earn-to-give or do direct EA work. I think that Robert also cares about those things, which is why I liked his weekly interview sessions, as I mentioned above.

However, I didn't get the sense that Jonah, the instructor for the first cohort, really cared about these things quite as much. Jonah strikes me as an intelligent individual whose heart is in academia, rather than in data science or industry. This was quite problematic, because, among other reasons, it meant that even his explanations of grittier things were too focused on the big picture, and too spare on details for some people to figure out how to actually do the thing at all. It also skewed the distribution of topics taught away from things relevant to industry.

[-][anonymous]5y 5

Could you please elaborate with specific examples of times when Jonah's explanations were too abstract and not sufficiently practical?

This will be useful information for us, because we certainly want to identify areas in which our curriculum needs further improvement. My personal recollection of Jonah's lectures is that they involved a lot of example code, visualization, back-and-forth Q&A, and interactive exploration of real datasets in lieu of presenting, say, abstract mathematical proofs.

It also skewed the distribution of topics taught away from things relevant to industry.

Along similar lines, what are some specific topics that you think were neglected in favor of more abstract but less applicable material?

I'm particularly interested in what material you thought was overemphasized in the curriculum--my impression is that all of the topics covered were very fundamental to data science as a whole. While one can express a valid preference for certain fundamental topics over others, I would be hard-pressed to say that any of the topics covered in the Signal curriculum weren't extremely industry-relevant.

I've already had versions of this conversation with Robert and Jonah in person, but I'll reiterate a few things I shared with them here, since you asked politely. Also, this conversation is becoming aversive to me, so it will become increasingly difficult for me to respond to your comments as we get farther and farther down this comment chain.

specific examples of times when Jonah's explanations were too abstract and not sufficiently practical?

There were actually multiple times during the first couple weeks when I (or my partner and I) would spend 4+ hours trying to fix one particular line of code, and Jonah would give big-picture answers about e.g. how linear regression worked in theory, when what I'd asked for were specific suggestions on how to fix that line of code. This led me to giving up on asking Jonah for help after long enough.

what are some specific topics that you think were neglected in favor of more abstract but less applicable material?

Intermediate and advanced SQL, practice of certain social skills (e.g. handshakes, being interested in your interviewer, and other interview-relevant social skills), and possibly nonlinear models.

Thanks for the written feedback (which adds to what I had gleaned in person).

There were actually multiple times during the first couple weeks when I (or my partner and I) would spend 4+ hours trying to fix one particular line of code, and Jonah would give big-picture answers about e.g. how linear regression worked in theory, when what I'd asked for were specific suggestions on how to fix that line of code. This led me to giving up on asking Jonah for help after long enough.

I think that what happened here is me having misunderstood what you were asking for, rather than any disinclination on my part to help you with individual lines of code. I will take this feedback into account.

Intermediate and advanced SQL, practice of certain social skills (e.g. handshakes, being interested in your interviewer, and other interview-relevant social skills), and possibly nonlinear models.

This is helpful detail regarding what you were looking for. Which topics would you have preferred to have been been dropped in favor of these?

I (or my partner and I) would spend 4+ hours trying to fix one particular line of code, and Jonah would give big-picture answers about e.g. how linear regression worked in theory

For context, what was your programming ability before you started the course? It seems strange to spend 4 hours getting (one line of) linear regression to work, but it also seems strange for an instructor to give a vague answer to something so basic, unless he was using the "Socratic Method"?

That's a funny comment. It does exactly the same thing twice: Please tell us where we didn't do too well, oh, and you are COMPLETELY WRONG because we did everything very well.

In context, it makes a lot of sense for him to do that. He's working for Signal now, so presumably is interested in how to improve the program, and he was a participant at the same time as Fluttershy, so he got an impression of the program as a participant.

In context, it makes a lot of sense for him to do that.

No, it doesn't. Continuing with the charitable interpretation, wearing these two hats at the same time is... difficult. Either he, as an employee of Signal, is genuinely interested in feedback, or he as a participant thinks Fluttershy is all wrong and making shit up because it was perfect for andrewjho (here he, of course, committs the typical mind fallacy, but that's a minor issue at this point).

I'm not personally comfortable with your comment

That's OK, this is not a requirement :-/ Fluttershy is clearly speaking from his/her personal point of view. If your experience was different, that's fine but that does not devalue the experience of other people.

it would be bad if that fact ended up discouraging applicants for whom the program would be a great fit.

The situation is symmetric: it would also be bad if some fact ended up encouraging applicants for whom the program would be a bad fit.

I think it is better to assess personal fit for the bootcamp. There are a lot of advantages I think you can get from the program that would be difficult to acquire quickly on your own.

Aside from lectures, a lot of the program was self study, including a lot of my most productive time at the bootcamp, but there was normally the option to get help, and it was this help, advice, and strategy that I think made the program far more productive than what I would have done on my own, or in another bootcamp for that matter (I am under the impression longer bootcamps may develop specific skills at using the software better, but they don't convey nearly the same level of conceptual understanding of statistics in data science, and likewise there are many types of mistakes graduates of other programs will make that graduates of Signal's cohort have been taught not to). When there was not the option to get help, I usually shifted my work schedule and it wasn't much of a problem: there are so many projects to work on, that there was almost always something productive to work on where I wouldn't get stuck (optional exercises on prior projects or making prior projects better). I can see this being very frustrating for some people though, as getting stuck and not having immediate feedback interrupts flow.

Many of the organizational problems didn't seem to really be problems, and seemed more like differences which are good for some and not for others. Pair programming was not always optimal due to the large degree of differences between students. It wouldn't have made sense for everyone to pair program since it would have been holding back some of the faster students. A more rigid structure would have helped people who were less naturally self directed/focused though. Organizational problems that happened with respect to the first cohort in terms of setting up (furniture, internet, whiteboards, etc.) are unlikely to be problems for future cohorts now that the instructors have learned from experience and have a place set up. The first cohort took the risks and costs of such things, which later cohorts probably won't have to worry about.

This is not like other bootcamps, it is less expensive, more individually focused rather than having the entire group doing all the same curriculum, and there are a bunch of rationalists iteratively helping you decide which jobs are best to apply to, who can network you into what position, and which skills actually matter most for aiming for the specific jobs you are aimed at. I don't expect you to be able to have the same opportunities at a normal bootcamp, but a normal bootcamp is probably also lower risk if you don't trust yourself to make things work out (other programs may have quizzes where they throw you out if you fail, and essentially force you to remain focused, with Signal you are more in control yourself, and can take time off to apply to jobs.

I think it is better to assess personal fit for the bootcamp.

Yes, this is correct.

Pair programming was not always optimal due to the large degree of differences between students.

You're good at socializing and very pleasant to be around, and didn't generally had problems finding pair programming partners when you wanted to work with someone. I'm shy, and couldn't even find anyone who wanted to pair program with me most days, even though I was generally interested in working with others, and often asked Jonah or other students if anyone wanted to work together.

I don't intend this as a demand, but you may wish to edit your top comment.

As it stands, the first line of the first comment on this post is "Avoid this program." Based on the comments in this thread it sounds like you think the program might be a good fit for some people.

Yet another student reporting in with a highly positive experience!

I personally felt Jonah knew data science really well. In addition to solid theoretical understanding of the mathematics, he was extremely proficient with using R and statistics to dissect and analyze complex real world data sets. At the beginning, he provided virtually step by step guidance on analysis and interpretation of several data sets using a variety of techniques and packages in R. The program only became more self-directed over time because the students, with diverse backgrounds and experiences, focused on different areas and progressed at different rates. Even then, I felt Jonah was very actively providing individually tailored guidance for the students on their learning and projects.

If you have strong fundamentals and are capable of getting up to speed on R quickly, then you can get a lot out of this program as I did. It provided me the basic knowledge and practice on using programming/statistics/machine learning to find patterns in real world data and make useful, meaningful statistical predictions from them. After this program, I know more or less how to approach data science, work independently, and fill whatever gaps I have. I would highly recommend this program to a self-motivated, mathematically minded person looking for a job in data science.

A lot of us came in with very different levels of knowledge and a big factor that determined success was whether or not you had experience with programming beforehand. To be fair a lot of non-programmers ended up being stars, like the student who made the word cloud, but they had to work a lot harder.

Just commenting as I have a new review up that disagrees with this comment.

While I understand Fluttershy's concerns, overall I was quite happy with the program. I think many people could benefit from it, and I would expect things to go better in a number of ways for the second cohort.

I think one of the problems we did have was that the pace of the program was simply too quick for some of the participants. This shouldn't be an issue for the second cohort, since it is intended to last 12 weeks.

Similarly, it was somewhat unpleasant to have the program taking place in the same house where many of the participants were living. I don't blame Jonah and Robert for this, since at the beginning I thought it would be more convenient this way myself, but it didn't turn out that way. However, my understanding is that there will be separate office space for the second cohort, which will be a big improvement.

I think it's going too far to say that the program is "effectively self-directed," but it's true that there was less structure imposed than some would expect, and that a good part of the benefit came from personal study. In this sense, a more established bootcamp might be better for people who have somewhat less self discipline, but as Gentzel points out, there are also specific advantages to this one.

Similarly, it was somewhat unpleasant to have the program taking place in the same house where many of the participants were living.

What was the problem with that decision?

At times the signal house was densely populated and a bunch of people got sick. These problems went away over time as some moved out, and we standardized better health practices (hand sanitizer freely available, people spreading out or working from their rooms if sick, etc).

I think that your point that future cohorts could be different is a good one. If, in a year from now, you're reading my above review and Signal is still around, I bet that some of the negative things I mentioned in my above comment will have changed for the better.

[-][anonymous]5y 6

I was one of the students in the first cohort of Signal Data Science. I had a very positive experience with the program--so much so that I decided to turn down looking for $100k+ data science jobs in the Bay Area to join Signal as an assistant instructor, because I believe in what Jonah and Robert are doing and want others to have the same extraordinary experience that I did. Now that I'm working for Signal, what I say about the program should be taken with a grain of salt, but I feel that it's important to share my experience.

To put my reflections in context: Coming into this program, I was very disillusioned with structured education, having had extremely negative experiences in university. (I found the coursework overly structured, onerous, and intellectually unstimulating.) However, prior to the beginning of Signal, I had positive opinions of both Jonah and Robert (they were both friends of friends), so I decided that it might be worth giving the program a try. Within the very first week of instruction, I was already amazed by the speed at which I was learning new material, the ease with which I was picking up R, and how effortless and enjoyable the whole experience was--in stark contrast to all of my previous experiences with structured education.

I was very impressed by both the breadth and the depth of the knowledge of the instructor and the TA (Jonah Sinick and Sam Eisenstat). Although I had done multiple research internships in college, all of which contained at least some tangential relation to data science and statistical techniques and three of which were highly quantitative in nature, they were able to offer me new insights even in areas where I had domain-specific knowledge and training (e.g. in the field of cognitive genomics). They were always available to help me when I got stuck and always acted congenially and professionally. Also, they were able to point me in a number of very fruitful directions for projects and further study, which I pursued to great benefit.

I also benefited greatly from the voluntary pair-programming structure of the program. At the beginning, I wasn't confident in my knowledge of R, but I was paired up with someone who had substantial past experience programming in R. In that single day, I learned more R than I did spending hours working through the R labs in Introduction to Statistical Learning. Afterward, I was similarly able to transmit my newfound expertise to other students, and finally, when I got started on my own projects and on days when I preferred to work independently, I was free to work by myself.

Lastly, there were interview and resume prep sessions every weekend with Robert, which helped me immensely. I still recall how, on the very first Saturday, we started out doing nontechnical interview prep: I stammered in my responses to the example questions and gave uncertain, incomplete responses. I was a total mess. And I still recall how, after an entire intense afternoon of practice (both with other students and with instructors), I was able to answer nontechnical interview questions confidently and fluently, as if I had been practicing these replies for years. I was astounded by the progress I had made in just five or six hours--and I can say the same about all of the Saturday interview prep sessions. It's very clear to me that Robert is an expert in interviewing and the complex dynamics of the job search.

As for the logistics of the program, I was generally happy with how Signal handled things. In particular, the bootcamp was willing to find housing to accommodate students, which is a service that no other bootcamp provides for its students--meaning that I didn't have to worry at all about trying to figure out where I would be living in Berkeley even though I was flying in from Seattle. To be sure, there were certain fixed costs that were unavoidable on account of the cohort being the first cohort, ever, of Signal Data Science, e.g. having to put together furniture for the house like beds and tables--but it was pretty clear to me going in that there was going to be stuff like that which just had to be taken care of. On balance, after considering the opportunity cost of finding and managing housing for students as well as the time saved on my own end, I definitely got an amazing deal on the rent. Of course, living in the same house with other motivated and intelligent students came with its own social and intellectual benefits as well. :-)

Overall, I would enthusiastically recommend the program to a friend. In fact, I have already been doing so to many people I know--because I want my friends to have the same amazing time here that I had!

I know you are now an employee of Signal but, well, this comment reads so much like an advert that I had trouble focusing on what you say.

You were "disillusioned with traditional education" and once you tried out this new program, you were "amazed at the speed with which you were learning" and "astounded by progress you had made in just a few hours", in boldface, no less! And that's before we got to the redundancy and hyperboles.

I know I should be trying to be #LessWrongMoreNice, but your review really should be taken with a grain of salt.

I decided to turn down looking for $100k+ data science jobs in the Bay Area to join Signal as an assistant instructor

I think looking for jobs as a data scientist would be a valuable experience for you, and "I turned down an offer from ____" and "I decided to not look" send very different signals. There's almost a month between now and the start of your next cohort; that should be plenty of time to see how far you'd get through the funnel.

[-][anonymous]5y 2

Thanks a lot for your reply--I really appreciate it. I agree completely that it would be a very valuable experience for me, and I'd like to get experience with interviewing. I was hoping to do it in this one-month interstitial period, but it's been quite busy for me: there's a lot of stuff that has to be done like building the website, recruiting for future cohorts, curriculum design, and so on and so forth. "I didn't look" might be a worse signal than "I turned down an offer from __", but at the moment I don't have much of a choice in which one I get to send.

Well, there is the case of a genetics data-science-ish startup extending me all but a formal offer last summer, and it would be possible for me to get back in touch with them, I suppose, but that was mostly due to the technical strength I demonstrated in an internship and not due to success in the traditional Bay Area tech company interview -> offer pipeline.

Since I've talked to several of the people involved (Jonah and two students), and I'm a data scientist, I figured I should weigh in.

It seems to me like data science, like programming, is mostly about thinking clearly, and so it's in some sense unsurprising that people could be trained to do it quickly, assuming that they start off thinking clearly and just need to learn specific techniques. But while this is somewhat true for programming, it seems less true for math (and statistics as a branch of math)--I would not expect, say, actuarial boot camps to be a reasonable thing.

My standard bootcamp advice also applies--the value of a programming bootcamp, for example, is that it gets you from 0 job offers to 1 job offer, and so it may be worthwhile to do the interview process first, and ensure that you actually need a bootcamp to get hired. (This is somewhat dangerous since most companies will only look at candidates once per year, and so if you don't get hired, do a three month bootcamp, and then try to get hired again, you may need to wait ~9 months.)

It's also worth pointing out that the better employers are mostly looking for quantitative PhD types; I was recently approached on LinkedIn by someone who is about to graduate from a one-year masters program focused on data science and I was reluctant to recommend that he apply to where I work because it was unclear he would pass either our HR filters or our technical interview process. (And if I'm reluctant to recommend him, I'm also reluctant to recommend anyone from Signal.)

But that said, there are a bunch of companies out there who need some sort of data science work, and there are multiple gradations of it. Lots of companies have many more 'analyst' roles than they do 'scientist' roles, and I would expect people who can make it through Signal to have a good shot at those. (Similar bootcamps boast high placement rates, but if you look at the sort of roles they mention, many of them are 'data analyst' or 'software engineer' roles that I would be reluctant to call "data science" but that may be because I take a fairly narrow view of the term, as a job description. If a software engineer goes to a bootcamp and gets a software engineer position afterwards, it's not clear whether or not that reflects a success.)

Be aware that the instruction will be fairly self-driven, and that you'll mostly get instruction on the level of "this is how you would find that answer" instead of "this is the answer," and if you're looking for the latter you should probably look elsewhere. (For example, instead of fixing a particular error, expect to be taught how to read error messages and Google them, which is overall a more useful answer.)

Also be aware, as mentioned earlier, that the value of a bootcamp is getting people from 0 offers to 1 offer, and until the first cohort has finished their job search, how effective Signal is at that remains unknown.


TL;DR: I can't feel comfortable recommending Signal until they have a proven track record, but talking with people involved has put my worries to rest. If you think you'd be a good fit for the environment at Signal (and the best way to figure this out is probably talking with Jonah) and you estimate the improvement in career prospects is worth the cost, go for it.

How many students have found work in data science (so far), what problems are they solving now, and what are the associated companies/cities/salaries?

Hi Toggle,

Thanks for your question!

Most of our students have just started looking for jobs over the past ~2 weeks, and the job search process in the tech sector typically takes ~2 months, from sending out resumes to accepting offers (see, e.g. "Managing your time" in Alexei's post Maximizing Your Donations via a Job).

The feedback loop here is correspondingly longer than we'd like. We expect to have an answer to your question by the time we advertise our third cohort.

Following up!

I'd love to see some results as well, and I'm assuming as soon as you have them they'd be posted. I looked under 'projects' and looked at the available LinkedIn profiles, and it looks like three of the students got jobs (well, more specifically 2 jobs and an internship). Those students already had impressive resumes going into the program, but this is quite encouraging to see.

Understood, sounds like that information won't be in for a while. I look forward to hearing about your results in a few months!

[This comment is no longer endorsed by its author]Reply

"We're planning another one in Berkeley from May 2nd – July 24th."

Is that June 24th?

Yes, that was supposed to be June 24th! We have a third one from July 5th – August 24th. There are still spaces in the program if you're interested in attending.