Keeping track of your computations with a SDK

‍

Once your computations have been started, an important part of the process is to keep track of them. If you are willing to do so manually, the most convenient way is to use the web UI. However, if you want to automatize that part, you may want to use the SDKs to do so.

List your tasks

To get a quick summary of your tasks:

import qarnot

conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')
for task in conn.all_tasks(summary=True):
    print(task)

‍

You can also omit the summary option to get the full task and act on them. It is also possible to filter the tasks by tags.

Below is an example of a code that lists the name, state and wall time of all the tasks that either have one of the to-follow or important tags. The code will also automatically perform a snapshot (to save results) and stop tasks that have been submitted for more than 24 hours.

import qarnot
import datetime

conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')
for task in conn.all_tasks(tags=['to-follow', 'important']):
    print(f'"{task.name}" is {task.state}, submitted {task.wall_time} ago')
    if datetime.time.fromisoformat(task.wall_time).hour > 24:
        print(f'stopping task {task.name}')
        task.instant()
        task.abort()

‍

Retrieve a specific task

A single task can also be retrieved with its uuid. All the task's information is available from here. However, once a task has been started, most parameters cannot be changed anymore.

import qarnot

conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')
task = conn.retrieve_task('<<<TASK_UUID>>>')

# All the task's information is stored in task.status
for i, info in enumerate(task.status.running_instances_info.per_running_instance_info):
    print(f'Running instance nb {i} is running on a {info.cpu_model}@{info.max_frequency_ghz}GHz')

# You can also access the buckets it uses
for bucket in task.resources:
    print(f'{bucket.description} is a resource of the task')

‍

Buckets

It is also possible to list buckets or retrieve specific buckets.

import qarnot

conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')

# List all buckets and print their name
for bucket in conn.buckets():
    print(f'{bucket.uuid}')

# Retrieve a single bucket with its name
bucket = conn.retrieve_bucket('<<<BUCKET_NAME>>>')

# Retrieve or create a bucket with its name
bucket = conn.create_bucket('<<<BUCKET_NAME>>>')

‍

Jobs and pools

Listing and retrieving jobs or pools follows the same logic as for tasks.

import qarnot

conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')

print('listing jobs:')
for job in conn.all_jobs():
    job: qarnot.job.Job
    print(f'  + {job.name} is {job.state}')

print('\nlisting pools:')
for pool in conn.all_pools():
    pool: qarnot.pool.Pool
    print(f'  + {pool.name} is {pool.state}')

‍

For more information on monitoring and debugging please consult the following articles

Core concept Tasks: which describes how tasks behave and their status
Monitoring resources
Fetching logs
Troubleshooting
Error codes

Préférence de confidentialité

Keeping track of your computations with a SDK

List your tasks

Retrieve a specific task

Buckets

Jobs and pools

Related article