SparkR on Windows 7

Apache SparkR on Windows 7

Like many people, I have been looking forward to working with SparkR. It was just released yesterday, you can find the full Apache Spark download here: Apache Spark Download

I have set up SparkR on a couple machines to run locally for testing. I did have a few hiccups getting it working, so I will document what I did here.

1. JDK 1.8_045
2. R 3.1.3
3. Apache Spark prebuilt with Hadoop-2.6
4. hadoop-common-2.2.0-bin-master
5. Fix logging. 
6. Paths
7. Run as Administrator.

1. Install the JDK 1.8_045 available here: jdk 1.8
2. Install R 3.1.3 available here: R 3.1.3

The first two items are standard Windows based installs, so they don't have to be put in any particular location. For the following downloads, I recommend creating a directory called Spark under your My Documents folder (C:\users\\Documents\Spark)

3.  Download and unzip Spark1.4 prebuilt with hadoop-2.6 available here: Spark 1.4 to the spark directory just referenced.

4. This link describes the winutils problem pretty well. Unzip the hadoop-common-2.2.0-bin.master, set up a HADOOP_HOME environment variable. I created this under the Spark directory mentioned above.

5. Copy spark-1.4.0-bin-hadoop2.6\conf\ to spark-1.4.0-bin-hadoop2.6\conf\
Edit the file.
log4j.rootCategory=INFO, console
log4j.rootCategory=ERROR, console

6. Make sure your path includes all the tools you just set up. A portion of my path is:
C:\Program Files\Java\jdk1.8.0_45\bin;C:\Users\<yourname>\Documents\Spark\spark-1.4.0-bin-hadoop2.6\bin;C:\Users\<yourname>\Documents\Spark\spark-1.4.0-bin-hadoop2.6\sbin;C:\Users\<yourname>\Documents\R\R-3.1.3\bin\x64

7. Run a command prompt as Administrator.

If I missed any of these steps, I had a number of issues to get things to work properly.

Now you can run sparkR

Have fun with Apache SparkR. I know I have much to try!

Good luck.

1 comment:

  1. Works, even for a knucklehead like me!

    On step 4, when following the link provided by Doug, the easiest solution to follow is the one titled "Simple Solution" provided by Prasad D.