Skip to content

Partitions

In Slurm, partitions define job execution environments by specifying:

  • Hardware constraints (e.g., GPU nodes, memory, CPUs).
  • Job limitations (e.g., maximum runtime, priority levels).
  • Resource allocations (e.g., number of nodes, GPUs, CPUs).

Each job must be assigned to a specific partion, either explicitly or by default. For example:

  • The default partition is called short, allowing jobs to use up to 8 compute nodes (or 512 cores) for 24 hours.
  • To access GPU nodes, assign your job to the gpu partition, which allows up to 64 cores and 4 NVIDIA A100 cards for two days.
  • The testing partition provides short-time access to resources for development and testing, which is useful when the cluster is fully utilized.

If your job requirements don't match the limits set for the available partitions, contact us via our helpdesk.

To select a given partition with a [Slurm command], use the -p <partition> option:

srun|srun|salloc|sinfo|squeue... -p <partition> [...]

Available Partitions and Their Parameters

Partition Nodes Time limit
(d-hh:mm)
Job size limit
(nodes/cores)
GPUs Priority factor
testing login01,login02 0-00:30 1/16 1 0.0
gpu n141-n148 2-00:00 1/64 4 0.0
short n001-n140 1-00:00 8/512 0 1.0
medium n001-n140 2-00:00 4/256 0 0.5
long n001-n140 4-00:00 1/64 0 0.0

Viewing Partitions and Their Definitions

Existing partions, included nodes, and general system state can be determined by the sinfo command.

sinfo output

sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  ncpu         up 1-00:00:00     22 drain* n[014-021,026-031,044-051]
  ncpu         up 1-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
  ncpu         up 1-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
  ncpu         up 1-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
  ngpu         up 2-00:00:00      4    mix n[141-143,148]
  ngpu         up 2-00:00:00      1  alloc n144
  ngpu         up 2-00:00:00      3   idle n[145-147]
  testing      up      30:00      2   idle login[01-02]
  gpu          up 2-00:00:00      4    mix n[141-143,148]
  gpu          up 2-00:00:00      1  alloc n144
  gpu          up 2-00:00:00      3   idle n[145-147]
  short*       up 1-00:00:00     22 drain* n[014-021,026-031,044-051]
  short*       up 1-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
  short*       up 1-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
  short*       up 1-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
  medium       up 2-00:00:00     22 drain* n[014-021,026-031,044-051]
  medium       up 2-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
  medium       up 2-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
  medium       up 2-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
  long         up 4-00:00:00     22 drain* n[014-021,026-031,044-051]
  long         up 4-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
  long         up 4-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
  long         up 4-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
The * in the partiton name indicates default partition. We see that all partitions are in different states - idle (up), alloc (allocated by user) or down. The information about each partition may be split over more than one line so that nodes in different states can be identified. The nodes in the marked by * in the STATE column indicate the nodes that are not responding.

The sinfo command has many options to easily let you view the information of interest to you in whatever format you prefer. See the sinfo documentation for more information or type sinfo --help.

Command scontrol show partitions can be used to view all available partitions and their definition/limits.

Displaying Information for the long Partition information
scontrol show partitions long
  ParrtitionName=long
    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
    AllocNodes=ALL Default=NO QoS=N/A
    DefaultTime=4-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
    MaxNodes=1 MaxTime=4-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
    Nodes=n[001-140]
    PriorityJobFactor=0 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
    OverTimeLimit=NONE PreemptMode=OFF
    State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
    JobDefaults=(null)
    DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
    TRES=cpu=8960,mem=35000G,node=140,billing=8960
    TRESBillingWeights=CPU=1.0,Mem=0.256G

Additional partitions

Commands sinfo and scontrol show partitions also lists two additional partitions, namely ncpu and ngpu, these default back to short and gpu partitions, respectively.

Walltime estimation and job efficiency

By default, none of the regular jobs you submit can exceed a walltime of 4 days (4-00:00:00). However, you have a strong interest to estimate accurately the walltime of your jobs. While it is not always possible, or quite hard to guess at the beginning of a given job campaign where you'll probably ask for the maximum walltime possible, you should look back as your historical usage for the past efficiency and elapsed time of your previously completed jobs using seff utilitiy. Update the time constraint [#SBATCH] -t [...] of your jobs accordingly, as shorter jobs are scheduled faster.