How do I use PBS?

PBS at TPAC

The Portable Batch Systems (PBS) used at TPAC varies between compute systems.

PBS Pro User Guide
PBS Pro Reference Guide

The following is intended to be generic information for more detailed information refer to the “man” pages or the user documentation above.

For kunanyi example qsub scripts are available in /share/apps/pbs_script_examples

Quick Syntax guide

qstat

Standard queue status command supplied by PBS. See man qstat for details of options.

Some common uses are:

List available queues: qstat -Q

qdel jobid Delete your unwanted jobs from the queues. The jobid is returned by qsub at job
submission time, and is also displayed in the nqstat output.

qsub

Submit jobs to the queues. The simplest use of the qsub command is typified
by the following example (note that there is a carriage-return after ./a.out):

$ qsub -P a99 -l select=1:ncpus=28 -l walltime=20:00:00 ./a.out
^D     (that is control-D)

or simply

$ qsub jobscript

where jobscript is an ascii file containing the shell script to run your commands
(not the compiled executable which is a binary file).
The qsub options are then placed within the script to avoid typing them for each
job e.g.:

#!/bin/bash
#PBS -P a99
#PBS -l select=1:ncpus=28
#PBS -l walltime=20:00:00
./a.out

You may need to enter data to the program and may be used to doing this interactively
when prompted by the program.

There are two ways of doing this in batch jobs.

If, for example, the program requires the numbers 1000 then 50 to be entered when
prompted. You can either create a file called, say, input containing these values

$ cat input
1000
50

then run the program as

./a.out < input

or the data can be included in the batch job script as follows:

#!/bin/bash
#PBS -P a99
#PBS -l select=1:ncpus=28
#PBS -l walltime=20:00:00
#PBS -l wd
./a.out << EOF
1000
50
EOF

Notice that the PBS directives are all at the start of the script, that there are
no blank lines between them, and there are no other non-PBS commands
until after all the PBS directives.

qsub options of note:

-l select=?	The number of nodes to be allocated to the job.
	Select the queue to run the job in. The queues you can use are listed by running nqstat. By default the routeq will be used which will automatically determine the queue based on the resources requested.
-l walltime=??:??:??	The wall clock time limit for the job. Time is expressed in seconds as an integer, or in the form: `[[hours:]minutes:]seconds[.milliseconds]` System scheduling decisions depend heavily on the walltime request it is always best to make as accurate a request as possible.
`-l mem=???MB`	The total memory limit across all nodes for the job – can be specified with units of “MB” or “GB” but only integer values can be given. There is a small default value. Your job will only run if there is sufficient free memory so making a sensible memory request will allow your jobs to run sooner.A little trial and error may be required to find how much memory your jobs are using – `nqstat` lists jobs actual usage.
-l ncpus=?	The number of cpus required for the job to run on. The default is 1. `-l ncpus=N` – If the number of cpus requested, N, is small (currently 16 or less on NF systems) the job will run within a single shared memory node. If the number of cpus specified is greater, the job will be distributed over multiple nodes. Currently on NF systems, these larger requests are restricted to multiples of 16 cpus.
-l jobfs=???GB	The requested job scratch space. This will reserve disk space, making it unavailable for other jobs, so please do not over estimate your needs.Any files created in the $PBS_JOBFS directory are automatically removed at the end of the job. Ensure that you use integers, and units of mb, MB, gb, or GB.
-l software=???	Specifies licensed software the job requires to run. See the software for the string to use for specific software.The string should be a colon separated list (no spaces) if more than one software product is used.If your job uses licensed software and you do not specify this option (or mis-spell the software), you will probably receive an automatically generated email from the license shadowing daemon, and the job may be terminated.You can check the lsd status and find out more by looking at the license status website.
-l other=???	Specifies other requirements or attributes of the job. The string should be a colon separated list (no spaces) if more than one attribute is required. Generally supported attributes are: `iobound` – the job should not share a node with other IO bound jobs `mdss` – the job requires access to the MDSS (usually via the mdss command). If MDSS is down, the job will not be started. `gdata1` – the job requires access to the /g/data1. If /g/data1 filesystem is down, the job will not be started. `pernodejobfs` – the job’s jobfs resource request should be treated as a per node request. Normally the jobfs request is for total jobfs summed over all nodes allocated to the job (like mem). Only relevant to distributed parallel jobs using jobfs. You may be asked to specify other options at times to support particular needs or circumstances.
-r y	Specifies your job is restartable, and if the job is executing on a node when it crashes, the job will be requeued.Both resources used by and resource limits set for the original job will carry over to the requeued job. Hence a restartable job must be checkpointing such that it will still be able to complete in the remaining walltime should it suffer a node crash.The default is that jobs are assumed to not be restartable. Note that regardless of the restartable status of a job, time used by jobs on crashed nodes is charged against the project they are running under, since the onus is on users to ensure minimum waste of resources via a checkpointing mechanism which they must build into any particularly long running codes.
-l wd	Start the job in the directory from which it was submitted. Normally jobs are started in the users home directory.

qps jobid show the processes of a running job

qls jobid list the files in a job’s jobfs directory

qcat jobid show a running job’s stdout, stderr or script

qcp jobid copy a file from a running job’s jobfs directory

The man pages for these commands on the system detail the various options you will probably need to use.

How do I use PBS?

PBS at TPAC

Quick Syntax guide

Related

Search

Resources

TPAC supported by

TPAC Supports

Recent Posts

Recent Comments

Archives

Categories

Meta

How do I use PBS?

PBS at TPAC

Quick Syntax guide

Related

Search

Resources

TPAC supported by

TPAC Supports

Tags

Recent Posts

Recent Comments

Archives

Categories

Meta