Chris Volinsky

Dec 2011

I loved this class and this professor. He brings a fascinating perspective, combining industry experience (he works for AT&T) and academic training which gives him a lot of cool stories to tell as well as knowledge about current trends in the wider data mining community. He also won the Netflix Prize (google it if you don't know what it is) and he spends a lot of time telling us about that, which was very cool. He's also a big sports fan so get ready for lots of sports examples (though he won't test you on them). He's generally a big data/stats nerd and often brings in cool examples of research of visualizations. He tends to go for breadth rather than depth and generally doesn't get too technical. I liked this approach -- I'd rather get exposed to more subjects and dive into them more on my own if I'm interested -- but I can understand if someone disagrees. He goes through a general introduction to data mining and then discusses: data visualization, cross-validation techniques, regression, classification, clustering, text mining, web mining, neural nets and support vector machines, ensemble methods, bayesian methods, recommender systems, and social networks. I would imagine that some of the later topics could be changed if he teaches the class again. In terms of a programming background, it's really very important to know R (or be willing to spend time learning it). You can also get by with SAS or Stata (or a similar statistical software) but he often gives tips on how to do things in R specifically, so you will be at a disadvantage. Knowledge of Java/Perl/Python is pretty irrelevant, though I guess it could be useful for your project. The difficulty of the homework assignments heavily depends on your ability in R (or SAS/Stata). If you're proficient, each one will take no more than a few hours. If you're learning R on the fly, they could be time-consuming. The term project takes a lot of time. He wants you to try many different data mining techniques and, depending on your data set, cleaning/preparing the data could involve hours and hours of work. On top of that, he expects a 10-20 page report including numerous visualizations. I found Chris to very approachable, although it is a big class and you will have to make an effort to speak to him if you so choose. He's an adjunct so he sometimes shows a lack of polish in his presentations, but he's very passionate about the field and I think that comes through. He's also very active in getting feedback about how to improve: I imagine he will be even better if he teaches again. About half the students are masters students in stat, with the other half being spread around various other masters programs (including finance, journalism, social science, econ). So don't be intimidated if your stat background isn't the strongest, but you definitely need to have taken several stat classes already (I'd say at least through 3107 and 4315 would be very helpful as well).