Auto-sklearn on Qarnot Cloud

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

AutoML is a technology that automates the most time consuming tasks of a Machine Learning project so that data scientists can spend more time on business problems on practical scenarios. It also allows everyone, instead of a small group of people, to use machine learning technology.

Here is a quick step by step walkthrough to guide you through the different steps of how to train Auto-sklearn, an AutoML framework, on Qarnot so follow along!

Version

Version: v0.12.5
Release year: 2021

If you are interested in another version, please send us an email at qlab@qarnot.com.

Prerequisites

Before starting a calculation with the Python SDK, a few steps are required:

Retrieve the authentication token
Install Qarnot’s Python SDK

Note: in addition to the Python SDK, Qarnot provides C# and Node.js SDKs and a Command Line.

Test Case

The data showcased in this tutorial is the electricity data set. This data was collected from the Australian New South Wales Electricity Market where electricity prices are set every five minutes based on supply and demand. Given this historical data, we have to predict whether the electricity prices will go up or down. This is called a binary classification problem and our class labels are UP and DOWN.

We want to build the best possible model using Auto-sklearn in a given time frame. The best way to do so is to train multiple models in parallel to increase the chances of building strong models. This is well suited for Qarnot’s HPC service which we will use for parallelizing Auto-sklearn’s computation across multiple nodes in a cluster. This test case will showcase how to train Auto-sklearn for 15 minutes on a 4 nodes Qarnot cluster.

The necessary input data needed for this tutorial can be downloaded here.

Before moving forward, you should setup your working environment to contain the following files:

input
- electricity-normalized.csv: training data
auto-sklearn.py: script to start the task (found bellow)

Launching the test case

Once your working environment is set up correctly you are almost ready to start. Be sure to copy your authentication token in the script (instead of <<<MY_SECRET_TOKEN>>>) to be able to launch the task on Qarnot.

#!/usr/bin/env python

# Import the Qarnot SDK
import qarnot

# Connect to the Qarnot platform
conn = qarnot.connection.Connection(client_token = '<<<MY_SECRET_TOKEN>>>')

# Create a task
task = conn.create_task('Hello World - Auto-Sklearn', 'auto-sklearn-cluster', 4)

# Create a resource bucket and add input files
input_bucket = conn.create_bucket('auto-sklearn-in')
input_bucket.sync_directory('input/')

# Attach the bucket to the task
task.resources.append(input_bucket)

# Create a result bucket and attach it to the task
task.results = conn.create_bucket('auto-sklearn-out')

# Basic Constants
task.constants['TARGET_COL'] = 'class' # Target column for classification
task.constants['TOTAL_TIME_LIMIT'] = '15' # Total training time limit in minutes
task.constants['PER_RUN_TIME_LIMIT'] = '5' # Per iteration training time limit in minutes

# Optional Constants
task.constants['N_CV_FOLDS'] = '3' # Number of cross validation folds
task.constants['ENSEMBLE_SIZE'] = '50' # Maximum number of models added to the ensemble
# If None, all possible estimators are used.
# Otherwise, specifies a set of estimators to include/exlude.
task.constants['INCLUDE_ESTIMATORS'] = 'None'
task.constants['EXCLUDE_ESTIMATORS'] = 'None'
# If None, all possible preprocessors are used.
# Otherwise, specifies a set of preprocessors to include/exclude.
task.constants['INCLUDE_PREPROCESSORS'] = 'None'
task.constants['EXCLUDE_PREPROCESSORS'] = 'None'

# Keep only the contents of outputs/ in the output bucket
task.results_whitelist = 'outputs'

# Submit the task to the API, that will launch it on the cluster
task.run(output_dir = ".")

To launch this script, simply copy the code above in a Python script and execute the following command in your terminal: python3 auto-sklearn.py &.

Results

At any given time, you can monitor the status of your task.

Once the training is done, the task state will pass to green. You can then check out the task’s output bucket auto-sklearn-out. There you will find different files like a training log, the saved model, the prediction confusion matrix and an accuracy over time graph.

Wrapping up

That’s it! If you have any questions, please contact qlab@qarnot.com and we will help you with pleasure!

Privacy Preferences