AutoML is a technology that automates the most time consuming tasks of a Machine Learning project so that data scientists can spend more time on business problems on practical scenarios. It also allows everyone, instead of a small group of people, to use machine learning technology.
Here is a quick step by step walkthrough to guide you through the different steps of how to train Auto-sklearn, an AutoML framework, on Qarnot so follow along!
If you are interested in another version, please send us an email at qlab@qarnot.com.
Before starting a calculation with the Python SDK, a few steps are required:
Note: in addition to the Python SDK, Qarnot provides C# and Node.js SDKs and a Command Line.
The data showcased in this tutorial is the electricity
data set. This data was collected from the Australian New South Wales Electricity Market where electricity prices are set every five minutes based on supply and demand. Given this historical data, we have to predict whether the electricity prices will go up or down. This is called a binary classification problem and our class labels are UP and DOWN.
We want to build the best possible model using Auto-sklearn in a given time frame. The best way to do so is to train multiple models in parallel to increase the chances of building strong models. This is well suited for Qarnot’s HPC service which we will use for parallelizing Auto-sklearn’s computation across multiple nodes in a cluster. This test case will showcase how to train Auto-sklearn for 15 minutes on a 4 nodes Qarnot cluster.
The necessary input data needed for this tutorial can be downloaded here.
Before moving forward, you should setup your working environment to contain the following files:
input
electricity-normalized.csv
: training dataauto-sklearn.py
: script to start the task (found bellow)Once your working environment is set up correctly you are almost ready to start. Be sure to copy your authentication token in the script (instead of <<<MY_SECRET_TOKEN>>>
) to be able to launch the task on Qarnot.
#!/usr/bin/env python
# Import the Qarnot SDK
import qarnot
# Connect to the Qarnot platform
conn = qarnot.connection.Connection(client_token = '<<<MY_SECRET_TOKEN>>>')
# Create a task
task = conn.create_task('Hello World - Auto-Sklearn', 'auto-sklearn-cluster', 4)
# Create a resource bucket and add input files
input_bucket = conn.create_bucket('auto-sklearn-in')
input_bucket.sync_directory('input/')
# Attach the bucket to the task
task.resources.append(input_bucket)
# Create a result bucket and attach it to the task
task.results = conn.create_bucket('auto-sklearn-out')
# Basic Constants
task.constants['TARGET_COL'] = 'class' # Target column for classification
task.constants['TOTAL_TIME_LIMIT'] = '15' # Total training time limit in minutes
task.constants['PER_RUN_TIME_LIMIT'] = '5' # Per iteration training time limit in minutes
# Optional Constants
task.constants['N_CV_FOLDS'] = '3' # Number of cross validation folds
task.constants['ENSEMBLE_SIZE'] = '50' # Maximum number of models added to the ensemble
# If None, all possible estimators are used.
# Otherwise, specifies a set of estimators to include/exlude.
task.constants['INCLUDE_ESTIMATORS'] = 'None'
task.constants['EXCLUDE_ESTIMATORS'] = 'None'
# If None, all possible preprocessors are used.
# Otherwise, specifies a set of preprocessors to include/exclude.
task.constants['INCLUDE_PREPROCESSORS'] = 'None'
task.constants['EXCLUDE_PREPROCESSORS'] = 'None'
# Keep only the contents of outputs/ in the output bucket
task.results_whitelist = 'outputs'
# Submit the task to the API, that will launch it on the cluster
task.run(output_dir = ".")
To launch this script, simply copy the code above in a Python script and execute the following command in your terminal: python3 auto-sklearn.py &
.
At any given time, you can monitor the status of your task.
Once the training is done, the task state will pass to green. You can then check out the task’s output bucket auto-sklearn-out
. There you will find different files like a training log, the saved model, the prediction confusion matrix and an accuracy over time graph.
That’s it! If you have any questions, please contact qlab@qarnot.com and we will help you with pleasure!