Apache SparkR on Windows 7
Like many people, I have been looking forward to working with SparkR. It was just released yesterday, you can find the full Apache Spark download here: Apache Spark DownloadI have set up SparkR on a couple machines to run locally for testing. I did have a few hiccups getting it working, so I will document what I did here.
1. JDK 1.8_045
2. R 3.1.3
3. Apache Spark prebuilt with Hadoop-2.6
4. hadoop-common-2.2.0-bin-master
5. Fix logging.
6. Paths
7. Run as Administrator.
1. Install the JDK 1.8_045 available here: jdk 1.8
2. Install R 3.1.3 available here: R 3.1.3
The first two items are standard Windows based installs, so they don't have to be put in any particular location. For the following downloads, I recommend creating a directory called Spark under your My Documents folder (C:\users\
3. Download and unzip Spark1.4 prebuilt with hadoop-2.6 available here: Spark 1.4 to the spark directory just referenced.
4. This link describes the winutils problem pretty well. Unzip the hadoop-common-2.2.0-bin.master, set up a HADOOP_HOME environment variable. I created this under the Spark directory mentioned above.
5. Copy spark-1.4.0-bin-hadoop2.6\conf\log4j.properties.template to spark-1.4.0-bin-hadoop2.6\conf\log4j.properties.
Edit the file. log4j.properties
Change:
log4j.rootCategory=INFO, console
to:
log4j.rootCategory=ERROR, console
6. Make sure your path includes all the tools you just set up. A portion of my path is:
C:\Program Files\Java\jdk1.8.0_45\bin;C:\Users\<yourname>\Documents\Spark\spark-1.4.0-bin-hadoop2.6\bin;C:\Users\<yourname>\Documents\Spark\spark-1.4.0-bin-hadoop2.6\sbin;C:\Users\<yourname>\Documents\R\R-3.1.3\bin\x64
7. Run a command prompt as Administrator.
If I missed any of these steps, I had a number of issues to get things to work properly.
Now you can run sparkR
Have fun with Apache SparkR. I know I have much to try!
Good luck.
Works, even for a knucklehead like me!
ReplyDeleteOn step 4, when following the link provided by Doug, the easiest solution to follow is the one titled "Simple Solution" provided by Prasad D.