BerkeleyX - CS100.1x
The title of the course is Intro to Big Data with Apache Spark.
This course is a collaboration between UC-Berkeley and DataBricks in order to formally introduce many continuous learners to Apache Spark.
I have a minor (very minor) advantage, in that I have actually done a few small research projects around Apache Spark. Including doing a writeup for setting up Spark 1.4 to use SparkR on Windows 7.
Week 1 - Data Science Background and course setup
Week 2 - Introduction to Apache Spark.
Week 3 - Data Management.
The setup portion, to me is very valuable. Creating a locally running Spark environment can be a little tedious. Working with Databricks Cloud is a dream, and running things with EC2 I haven't tried, but it does take a couple steps to get things going.
The labs are straightforward with a few challenges to make you think about what you are doing. Regular expressions are incredibly useful, not only in particular for this class, but they can be used in a variety of settings. I would strongly recommend reviewing Regular Expressions, and Python before getting started.
The instructor is very active on the discussion forums and has done a phenomenal job of working with some very frustrated students.
I think the frustration of many of the students are a result of being new to functional programming, There have been discussions around this, and I think the next iteration of the class may spend more time emphasizing this.
Overall, each lab has demonstrated some new capability. The later labs do make reference to the earlier ones.
The class is still going on, and I believe students can still enroll.
This class can be taken by itself or in conjunction with the followup course: Scalable Machine Learning
If the follow-on course is as exciting as this course is, it will be well worth the time invested in learning Spark Machine Learning.
Now if they would come up with a full edX course on GraphX, that would be awesome!