Skip to content

Accounting

The accounting functionality of the SLURM batch system keeps track of the resources spent by users while running jobs on SAS HPC clusters.

In order to submit a job on the cluster, the user must have access to a project (either testing, regular, or commercial) that has a sufficient free allocation to run the job.

You can find more information on how to get access and create a user project in the Get Project Guide.

There are two kinds of allocations available for a project:

  • CPU allocation (to be spent on the universal compute nodes)
  • GPU allocation (for jobs that require accelerated nodes)

The appropriate time limits are set automatically by selecting a partition for job execution.

You can check your remaining allocation using the sprojects command.

Note

If a project runs out of allocation, new jobs can still be submitted, but they will remain in the pending state until more allocation becomes available.

If you are out of project quota, your job submission will still work normally. However, the jobs will remain in the pending state as in the following example:

Out of project quota


squeue
    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    224     short    example     user PD       0:00      1 (QOSGrpBillingMinutes)

The reason QOSGrpBillingMinutes indicates that the project has run out of allocation.

Billing Units

Project allocations are awarded in core-hours. Since compute nodes contain multiple CPU cores, running a job for one hour consumes an amount of allocation proportional to the resources reserved by the job.

However, some jobs may not utilize all CPU cores while still occupying large amounts of node resources (for example memory-intensive workloads or GPU jobs). To ensure fair accounting of cluster resources, SLURM internally uses billing units (BU).

Interpretation:

  • If your job fully utilizes CPU cores, BU is determined by the core count.
  • If your job requires large memory but few CPU cores, BU is scaled according to the memory usage.
  • If your job uses GPUs, BU is adjusted based on the number of GPUs allocated to the job.

Devana compute nodes contain 64 CPU cores.

Billing units are calculated as:

BU = MAX(number_of_cores, memory_in_GB * 0.256)                # Universal nodes
BU = MAX(number_of_cores, memory_in_GB * 0.256, GPUs * 16)     # Accelerated nodes

Perun compute nodes contain 320 CPU cores.

Billing units are calculated as:

BU = MAX(number_of_cores, memory_in_GB * X)
BU = MAX(number_of_cores, memory_in_GB * X, GPUs * Y)

Warning

The billing system ensures that jobs reserving large amounts of memory or GPUs are charged appropriately even if they use only a small number of CPU cores.

You can check you job's billing units rate by scontrol show job JOBID command. The following examples demonstrate billing units behaviour in more detail:

Different jobs utilizaton

In this case we are running a job on all 64 cores within 1 node. The billing rate is therefore 64.


 srun -n 64 --pty bash
   srun: job 30925 queued and waiting for resources
   srun: job 30925 has been allocated resources
 
scontrol show job 30925 JobId=30925 JobName=bash UserId=demovic(187000051) GroupId=demovic(187000051) MCS_label=N/A Priority=18994 Nice=0 Account=p70-23-t QOS=p70-23-t JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:07 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-08-08T15:08:13 EligibleTime=2023-08-08T15:08:13 AccrueTime=2023-08-08T15:08:13 StartTime=2023-08-08T15:08:14 EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-08T15:08:14 Scheduler=Main Partition=ncpu AllocNode:Sid=login02:9298 ReqNodeList=(null) ExcNodeList=(null) NodeList=n043 BatchHost=n043 NumNodes=1 NumCPUs=64 NumTasks=64 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=64,mem=250G,node=1,billing=64 Socks/Node=* NtasksPerN:B:S:C=0:0:: CoreSpec=* MinCPUsNode=1 MinMemoryCPU=4000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/home/demovic Power=

Similar as above, but utilizing 2 full nodes. Thus the billing rate is 128 (2*64).


 srun -N 2 --ntasks-per-node=64 --pty bash
 srun: job 30927 queued and waiting for resources
 srun: job 30927 has been allocated resources
 
scontrol show job 30927 JobId=30927 JobName=bash UserId=demovic(187000051) GroupId=demovic(187000051) MCS_label=N/A Priority=18993 Nice=0 Account=p70-23-t QOS=p70-23-t JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:04 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-08-08T15:20:25 EligibleTime=2023-08-08T15:20:25 AccrueTime=2023-08-08T15:20:25 StartTime=2023-08-08T15:20:26 EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-08T15:20:26 Scheduler=Main Partition=ncpu AllocNode:Sid=login02:9298 ReqNodeList=(null) ExcNodeList=(null) NodeList=n[037-038] BatchHost=n037 NumNodes=2 NumCPUs=128 NumTasks=128 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=128,mem=500G,node=2,billing=128 Socks/Node=* NtasksPerN:B:S:C=64:0:: CoreSpec=* MinCPUsNode=64 MinMemoryCPU=4000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/home/demovic Power=

When using only a part of the node, the billing rate is adjusted proportionally. In this case we are utilizing 32 cores therefore the billing rate is halved.


 srun -n 32 --pty bash
   srun: job 30929 queued and waiting for resources
   srun: job 30929 has been allocated resources
 
scontrol show job 30929 JobId=30929 JobName=bash UserId=demovic(187000051) GroupId=demovic(187000051) MCS_label=N/A Priority=18994 Nice=0 Account=p70-23-t QOS=p70-23-t JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:04 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-08-08T15:26:13 EligibleTime=2023-08-08T15:26:13 AccrueTime=2023-08-08T15:26:13 StartTime=2023-08-08T15:26:14 EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-08T15:26:14 Scheduler=Main Partition=ncpu AllocNode:Sid=login02:9298 ReqNodeList=(null) ExcNodeList=(null) NodeList=n043 BatchHost=n043 NumNodes=1 NumCPUs=32 NumTasks=32 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=32,mem=125G,node=1,billing=32 Socks/Node=* NtasksPerN:B:S:C=32:0:: CoreSpec=* MinCPUsNode=32 MinMemoryCPU=4000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/home/demovic Power=

This example shows a behaviour when user requests all available memory within a node. Although the job is requesting only one core, the billing rate corresponds to the whole node utilization (64) as there is no more space for other users jobs.


 srun -n 1 --mem=250GB --pty bash
   srun: job 30930 queued and waiting for resources
   srun: job 30930 has been allocated resources
 
scontrol show job 30930 JobId=30930 JobName=bash UserId=demovic(187000051) GroupId=demovic(187000051) MCS_label=N/A Priority=18994 Nice=0 Account=p70-23-t QOS=p70-23-t JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-08-08T15:33:50 EligibleTime=2023-08-08T15:33:50 AccrueTime=2023-08-08T15:33:50 StartTime=2023-08-08T15:33:51 EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-08T15:33:51 Scheduler=Main Partition=ncpu AllocNode:Sid=login02:9298 ReqNodeList=(null) ExcNodeList=(null) NodeList=n043 BatchHost=n043 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=1,mem=250G,node=1,billing=64 Socks/Node=* NtasksPerN:B:S:C=1:0:: CoreSpec=* MinCPUsNode=1 MinMemoryNode=250G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/home/demovic Power=

The same as above but with 32 cores per node. This example shows that higher utilization factor (memory in this case) takes precedence.


 srun -n 32 --mem=250GB --pty bash
   srun: job 30931 queued and waiting for resources
   srun: job 30931 has been allocated resources
 
scontrol show job 30931 JobId=30931 JobName=bash UserId=demovic(187000051) GroupId=demovic(187000051) MCS_label=N/A Priority=18994 Nice=0 Account=p70-23-t QOS=p70-23-t JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:02 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-08-08T15:36:12 EligibleTime=2023-08-08T15:36:12 AccrueTime=2023-08-08T15:36:12 StartTime=2023-08-08T15:36:13 EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-08T15:36:13 Scheduler=Main Partition=ncpu AllocNode:Sid=login02:9298 ReqNodeList=(null) ExcNodeList=(null) NodeList=n043 BatchHost=n043 NumNodes=1 NumCPUs=32 NumTasks=32 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=32,mem=250G,node=1,billing=64 Socks/Node=* NtasksPerN:B:S:C=0:0:: CoreSpec=* MinCPUsNode=1 MinMemoryNode=250G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/home/demovic Power=

Since the acclerated nodes have 4 GPUs, allocating just one of them causes billing rate to be at ¼ of the total rate per node. Notice that RAM allocation has been set to 62.5 GB (¼ of the total RAM) by SLURM automatically.


 srun --partition=ngpu -G 1 --pty bash
   srun: job 30934 queued and waiting for resources
   srun: job 30934 has been allocated resources
 
scontrol show job 30934 JobId=30934 JobName=bash UserId=demovic(187000051) GroupId=demovic(187000051) MCS_label=N/A Priority=18994 Nice=0 Account=p70-23-t QOS=p70-23-t_gpu JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:03 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-08-08T15:52:39 EligibleTime=2023-08-08T15:52:39 AccrueTime=2023-08-08T15:52:39 StartTime=2023-08-08T15:52:40 EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-08T15:52:40 Scheduler=Main Partition=ngpu AllocNode:Sid=login02:9298 ReqNodeList=(null) ExcNodeList=(null) NodeList=n143 BatchHost=n143 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=1,mem=62.50G,node=1,billing=16,gres/gpu=1 Socks/Node=* NtasksPerN:B:S:C=0:0:: CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/home/demovic Power= MemPerTres=gpu:64000 TresPerJob=gres:gpu:1

When using 2 GPUs, the billing rate is 32.


 srun --partition=ngpu -G 2 --pty bash
   srun: job 30937 queued and waiting for resources
   srun: job 30937 has been allocated resources
 
scontrol show job 30937 JobId=30937 JobName=bash UserId=demovic(187000051) GroupId=demovic(187000051) MCS_label=N/A Priority=18994 Nice=0 Account=p70-23-t QOS=p70-23-t_gpu JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:04 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-08-08T16:04:17 EligibleTime=2023-08-08T16:04:17 AccrueTime=2023-08-08T16:04:17 StartTime=2023-08-08T16:04:17 EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-08T16:04:17 Scheduler=Main Partition=ngpu AllocNode:Sid=login02:9298 ReqNodeList=(null) ExcNodeList=(null) NodeList=n143 BatchHost=n143 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=1,mem=125G,node=1,billing=32,gres/gpu=2 Socks/Node=* NtasksPerN:B:S:C=0:0:: CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/home/demovic Power= MemPerTres=gpu:64000 TresPerJob=gres:gpu:2

And finally, although the job needs just one GPUs, it's billed at the rate of 64, since it is using the whole memory.


 srun --partition=ngpu -G 1 --mem=250GB --pty bash
   srun: job 30936 queued and waiting for resources
   srun: job 30936 has been allocated resources
 
scontrol show job 30936 JobId=30936 JobName=bash UserId=demovic(187000051) GroupId=demovic(187000051) MCS_label=N/A Priority=18994 Nice=0 Account=p70-23-t QOS=p70-23-t_gpu JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-08-08T15:59:37 EligibleTime=2023-08-08T15:59:37 AccrueTime=2023-08-08T15:59:37 StartTime=2023-08-08T15:59:38 EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-08T15:59:38 Scheduler=Main Partition=ngpu AllocNode:Sid=login02:9298 ReqNodeList=(null) ExcNodeList=(null) NodeList=n143 BatchHost=n143 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=1,mem=250G,node=1,billing=64,gres/gpu=1 Socks/Node=* NtasksPerN:B:S:C=0:0:: CoreSpec=* MinCPUsNode=1 MinMemoryNode=250G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/home/demovic Power= MemPerTres=gpu:64000 TresPerJob=gres:gpu:1

sprojects - View Projects Information

This command displays information about projects available to a user and project details, such as available allocations, shared directories and members of the project team.

The sprojects script shows the available slurm account (projects) for the selected user ID. If no user is specified (with -u) the script will display the info for current user.

Show available accounts for the current user

sprojects 
   The following slurm accounts are available for user user1:
   p70-23-t

Option -a force the script to display just allocations (in corehours or GPU hours) as spent/awarded.

Show all available allocations for the current user

sprojects -a 
   +=================+=====================+
   |     Project     |     Allocations     |
   +-----------------+---------------------+
   | p70-23-t        | CPU:      10/50000  |
   |                 | GPU:       0/12500  |
   +=================+=====================+

With -f option the script will display more details (including available allocations).

Show full info for the current user

sprojects -f 
   +=================+=========================+============================+=====================+
   |     Project     |       Allocations       |      Shared storages       |    Project users    |
   +-----------------+-------------------------+----------------------------+---------------------+
   | p371-23-1       | CPU:    182223/500000   | /home/projects/p371-23-1   | user1               |
   |                 | GPU:       542/1250     | /scratch/p371-23-1         | user2               |
   |                 |                         |                            | user3               |
   +-----------------+-------------------------+----------------------------+---------------------+
   | p81-23-t        | CPU:     50006/50000    | /home/projects/p81-23-t    | user1               |
   |                 | GPU:       766/781      | /scratch/p81-23-t          | user2               |
   +-----------------+-------------------------+----------------------------+---------------------+
   | p70-23-t        | CPU:    485576/5000000  | /home/projects/p70-23-t    | user1               |
   |                 | GPU:       544/31250    | /scratch/p70-23-t          | user2               |
   |                 |                         |                            | user4               |
   |                 |                         |                            | user5               |
   |                 |                         |                            | user6               |
   |                 |                         |                            | user7               |
   +=================+=========================+============================+=====================+
Created by: Andrej Sec