venerdì 19 settembre 2014

More MOOC offers for data scientists

As promised in the previous post, I continue listing the numerous MOOC offers about Data Science, also on the basis of the feedback received.

It is worthwhile to mention that Data Analysis and Statistical Inference offers its labs through another valuable platform: DataCamp. Using this platform is like having an instructor at your side, explaining the exercise and giving you feedback. Alternatively, you can complete the exercises just using Rstudio and submitting your scripts.

On the other hand, the Data Science Specialization team has developed an innovative tool for learning R interactively: the swirl package. The idea is learning by doing, and it's fairly simple to get started with R using it. From Rstudio, all you need to do is to install the package, typing:

install.packages("swirl")

then, start swirl:

library("swirl")
swirl

And you will be in the learning environment. You have to select a course that you want to follow and the rest is really self explanatory. The learning sessions are conveniently not too long.

The University of Washington also offers a course: Introduction to Data Science, that includes the basic techniques of data science as well as databases, MapReduce, Hadoop, SQL and NoSQL. It also covers elements of statistical modelling and machine learning, as well as communication of results, and Graph analysis. The recommended (but not compulsory) textbook is "Mining of Massive Datasets". The programming assignments entail Python and SQL besides the usual R. According to the opinion of a colleague that has reviewed it, the word "Introduction" in the title is misleading, since it covers a whole lot more than an introduction.

Stanford University offers a course called Machine Learning, that is also in my watching list as I plan to review it in detail when I'll be more advanced in my studies.

Another course on Machine Learning is Learning From Data, offered by Caltech through edX. This course is currently closed but the material of the 2014 course can be consulted and studied at your own pace. It covers from the basic theory to algorithms and applications. The video lectures are also on YouTube.

Finally, a couple of more courses, both offered by Stanford, that can surely be complementary to those already mentioned, are: Statistical Learning and Introduction to Databases.

I think that with this outstanding offer, I have no excuses left for procrastination, even when I'm not at home - did you know? Coursera has an App for Android!

giovedì 18 settembre 2014

In the loop of the mooc!

Education is changing and the change is affecting our lives and the way we spend our time. I hardly believe that anyone has never heard about MOOCs, which stands for Massive Open Online Courses. Browsing among the huge offer of courses available online for free is something like discovering a new world.. or several new worlds actually.

In the variety of the courses offered by Coursera I have found utterly interesting the Data Science Specialization offered by the Johns Hopkins University, that entails 9 courses and a final Capstone Project. I appreciate this offer especially because they concentrated several essential information that if you wanted to collect otherwise you should be reading tons of books, web pages, software documentation, probably without finding immediately the connection among them.

With the recent advances in technology, trans-disciplinary concepts such as exploratory data analysis, reproducible research, regression models, machine learning, are progressively gaining importance in several fields and are shaping the "profession" of a "data scientist", a professional with a strong background in statistics as well as cutting edge expertise in technology.

So far I have successfully completed the first two courses, namely The Data Scientist's Toolbox and R programming. Since I'm a lazy person, I need to be motivated, otherwise I'll use the excuse that "I don't have time, I'll do it later". That's mainly why I enrolled in the Signature Track, in order to have deadlines, and eventually I got certificates, and shareable permanent links to course record pages, that look like this and this.

I paused in August and first week of September, and missed the beginning of the other courses, thus I'm starting again in October.

Meanwhile I've found another relevant course, partly overlapping some of the concepts of the specialization, namely Data Analysis and Statistical Inference, offered by Duke University. I'm currently enrolled in this latter, and I found several advantages: it is a lot oriented towards applied statistics and offers tons of practical examples. It also offers an excellent book (free for download, but I bought it due to the ridiculous price - and because I love paper books).

A bonus is that it doesn't require previous knowledge of statistics, which allowed me to brush up my statistics, proceeding quickly through the first weeks of course (I did the first week in one afternoon - OK, I admit it, it took me until 1.30 am).

I'm also watching another course, that is starting at the end of September: it's Mining Massive Datasets, offered by Stanford University. Yes, Big Data. I know it is probably too much in my schedule, but hey, better than mindlessly surfing Facebook in my spare time..

Next time I'll talk about more MOOC platforms and their offers..