UP | HOME

Date: <2024-12-11 Wed>

Email Notification for PBS Queue

Table of Contents

I had to run some simulation code in a cluster which uses PBS system for queuing jobs. Since the jobs take a long time to finish, I wanted an email notification when all the jobs are completed. Here are two approaches to do that.

1. Approach 1 - PBS system

If you can add flags to the PBS command for submitting jobs (qsub) you can add -m ae -M email@address.com to send email notification when the job is aborted or completed. [Source: purdue.edu]

The argument to -m are:

  • a : mail is sent when the job is aborted by the batch system.
  • b : mail is sent when the job begins execution.
  • e : mail is sent when the job terminates.

This approach is useful when you have only on job to track of. If you have scheduled multiple jobs and want to be notified when all of them complete, next approach would be more useful.

2. Approach 2 - Cronjob

If you can edit cronjobs (cronjob is a system to schedule scripts to be run periodically), then you can add a cronjob to check the PBS queue and notify when the queue is empty.

To add a cronjob run crontab -e in the shell and add the following line:

< Collapse code block> Expand code block
* * * * * /home/username/check_queue.sh

This asks cronjob to run the script check_queue.sh every minute. And in that script, we can check the queue using qstat command and if it is empty, we send a mail. But if we do just that, then an email would be sent ever minute after the queue is empty. So, the script creates a file nocheck after the first email is sent. And the script is written in such a way that it checks for the file nocheck when the script starts and if it is found the script aborts. In this way we get only one email.

And next time when you want notification, just delete the nocheck file (rm nocheck).

< Collapse code block> Expand code block
#!/bin/sh

if [ -f /home/username/nocheck ]; then
    exit 1
fi

if [ $(/opt/pbs/bin/qstat | wc -l ) -le 2 ] ; then
    mail -s "Queue complete" email@address.edu < /dev/null
    touch /home/username/nocheck
    exit 0
fi

You can send your feedback, queries here