Sparkmagic lets the user connect to a remote Spark cluster from a local Jupyter Notebook and interact with it through Livy, a Spark REST server, with the help of magics, a set of commands for interactively running Spark code in multiple languages.
Here is a quick step by step guide to use Sparkmagic to interact with a spark cluster running on Qarnot from your local computer.
If you are interested in another version, please send us an email at qlab@qarnot.com.
Before starting a calculation with the Python SDK, a few steps are required:
Note: in addition to the Python SDK, Qarnot provides C# and Node.js SDKs and a Command Line.
This tutorial will showcase how to count the number of words in the Iliad in a distributed way using Sparkmagic. The workflow is as follows:
Before moving forward, you should setup your working environment to contain the following files:
input
: iliad.txt
: text file containing the Iliad to be counted on Qarnot.spark-magic.py
: script for starting the cluster on Qarnot (see below).wordcount.ipynb
: jupyter notebook to connect to the Spark cluster.Both input and wordcount.ipynb can be downloaded from the following link.
Before moving on, make sure to install Sparkmagic by following these simple steps. Note that it is preferable to install it in a Python virtual environment.
pip install sparkmagic
Note that the compilation of pykerberos
, a Sparkmagic dependency can fail if you do not have the required library installed. If you encounter an error like : error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
, try running the following command sudo apt install krb5-multidev
if you are using Ubuntu.
jupyter nbextension enable --py --sys-prefix widgetsnbextension
Once everything is set up, use the following script to launch the cluster on Qarnot. To do so, copy the following code in a python script named spark-magic.py
at the same level as input
and wordcount.ipynb
.
<<<MY_SECRET_TOKEN>>>
in line 10.<<<PUBLIC_SSH_KEY>>>
in line 27.To launch this script simply execute python3 spark-magic.py
in your terminal.
By default, it will connect you to Qarnot via ssh in a gnome-terminal
, if you do not have this terminal app installed or wish to use another one you can run python3 spark-magic.py --terminal=<<<unix-terminal-app>>>
. Additionally, if you want to disable this feature and only print out the command that you can run in your terminal on your own, you can set --terminal=off
.
Once a new terminal spawns on your end it means that the ssh connection with the cluster is secured. You can then launch the provided notebook by running jupyter notebook wordcount.ipynb
on your local terminal.
This notebook contains easy to follow steps to connect to the Spark cluster and complete this use case! The screenshot below shows what the notebook should look like once you have completed the use-case.
You also get access to the following forwarded dashboards by typing localhost:<port>
in your browser.
At any given time, you can monitor the status of your task on our platform.
Once you are done with the task, just type exit
in the ssh terminal to close the tunneling and make sure to abort the task from our platform. If you do not abort the task manually it will continue running and use your credits.
That’s it! If you have any questions, please contact qlab@qarnot.com and we will help you with pleasure!