Once your computations have been started, an important part of the process is to keep track of them. If you are willing to do so manually, the most convenient way is to use the web UI. However, if you want to automatize that part, you may want to use the SDKs to do so.
To get a quick summary of your tasks:
import qarnot
conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')
for task in conn.all_tasks(summary=True):
print(task)
You can also omit the summary option to get the full task and act on them. It is also possible to filter the tasks by tags.
Below is an example of a code that lists the name, state and wall time of all the tasks that either have one of the to-follow or important tags. The code will also automatically perform a snapshot (to save results) and stop tasks that have been submitted for more than 24 hours.
import qarnot
import datetime
conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')
for task in conn.all_tasks(tags=['to-follow', 'important']):
print(f'"{task.name}" is {task.state}, submitted {task.wall_time} ago')
if datetime.time.fromisoformat(task.wall_time).hour > 24:
print(f'stopping task {task.name}')
task.instant()
task.abort()
A single task can also be retrieved with its uuid. All the task's information is available from here. However, once a task has been started, most parameters cannot be changed anymore.
import qarnot
conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')
task = conn.retrieve_task('<<<TASK_UUID>>>')
# All the task's information is stored in task.status
for i, info in enumerate(task.status.running_instances_info.per_running_instance_info):
print(f'Running instance nb {i} is running on a {info.cpu_model}@{info.max_frequency_ghz}GHz')
# You can also access the buckets it uses
for bucket in task.resources:
print(f'{bucket.description} is a resource of the task')
It is also possible to list buckets or retrieve specific buckets.
import qarnot
conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')
# List all buckets and print their name
for bucket in conn.buckets():
print(f'{bucket.uuid}')
# Retrieve a single bucket with its name
bucket = conn.retrieve_bucket('<<<BUCKET_NAME>>>')
# Retrieve or create a bucket with its name
bucket = conn.create_bucket('<<<BUCKET_NAME>>>')
Listing and retrieving jobs or pools follows the same logic as for tasks.
import qarnot
conn = qarnot.Connection(client_token='<<<MY_SECRET_TOKEN>>>')
print('listing jobs:')
for job in conn.all_jobs():
job: qarnot.job.Job
print(f' + {job.name} is {job.state}')
print('\nlisting pools:')
for pool in conn.all_pools():
pool: qarnot.pool.Pool
print(f' + {pool.name} is {pool.state}')
For more information on monitoring and debugging please consult the following articles