TTIC compute cluster (a.k.a. gouda) guidelines

July, 2014


This document describes current cluster policies and provides instructions for using our scheduler, Sun Grid Engine (SGE). This is not intended as a complete tutorial. It assumes that the reader is already familiar with the main SGE commands (qsub, qlogin, qstat, qmon, qalter, qdel, etc.). For more information about any of the options described below, please refer to the man pages or SGE user's manual, or ask Adam, Greg, or Karen

General policy:
Job priority determination:

The scheduler does not operate in a first-in-first-out mode but rather allocates resources according to priority. This enables fair use of the resources, and avoids users having to "negotiate" cluster division amongst themselves. The priority of a job is determined by a weighted combination of:
The relative priorities of all of the jobs in the queue are recalculated at each scheduling interval (currently set to 15 seconds). Some details on the priority components and job submission: Multiple queues:

There are 4 queues for different lengths of jobs. The following table gives the specs of the queues. In most cases, there should be no need to keep track of the nodes; they are included here for completeness. Note that the queues are setup in a cascading system so that any node that is in the very-long queue is also in the long queue, medium queue, and short queue.

Queue name

very-long.q
long.q
medium.q
short.q
CPU time limit

1 week
36 hours
4 hours
30 minutes
Wall clock time limit

1.5 weeks
108 hours
12 hours
90 minutes
Nodes in queue

6, 7-11, 15-16, 29, 30
0, 1, 4, 5, 12, 14, 27, 28
2, 3, 13, 25, 26
23, 24
Total # cores in queue

86
164
228
244

Any job must be submitted to one of the queues, by specifying the corresponding length attribute of general form -l <length>:
-l short
-l medium
-l long
-l very-long

Any job not specifying a queue will be submitted to the "very-long" queue. Any job exceeding either the CPU or wall clock time limit in the queue in which it's running will be automatically killed. Note that you can no longer use options like '-q all.q@compute-0-N' because such queues no longer exist. If for some reason you need to submit to a specific node which is included in a certain queue, say the medium queue, use '-q medium.q@compute-0-N -l medium'.




General guidelines and hints:
Examples:
For more information:
Using Theano

If you plan to use Theano on gouda, please add the following two lines to your script to avoid multithreading.

export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1

If you intend to use multithreading, you can adjust the variables accordingly. For example, if you want to use 4 threads, add the following two lines to your script,

export OMP_NUM_THREADS=4
export OPENBLAS_NUM_THREADS=4

and then submit your job with qsub -pe serial 4.