In the previous blog, we looked at how to install Apache Spark. In this blog, we will look at how to run Spark Python programs in an interactive way using IPython. For those who are curious, here is a screencast on the same.
IPython is a web based interactive environment for executing code snippets, plotting graphs, collaborating with others and a lot of nice cool things. IPython can be installed as a stand alone or as a part of Anaconda which includes IPython and libraries like Pandas, Numpy which we will try to explore later. So, here are the steps.
1) Download and install Anaconda. Anaconda is not part of Ubuntu repository, so it has to be installed manually. Also, Anaconda has to be updated manually.
conda update conda;conda update anaconda
2) Edit the .bashrc file to add Anaconda to the path and to specify the IPython options related to Spark.
export IPYTHON_OPTS="notebook --notebook-dir=/home/praveen/Code/ipython-notebook --pylab inline"
3) Go to the Spark installation folder and start pyspark
4) A browser will be launched where in a notebook can be created and a Spark wordcount program can be executed in an interactive way as shown in this screencast.