Skip to content

Practical Commands

Basic SLURM commands

sinfo - general system state information

Existing partions, included nodes, and general system state can be determined by the sinfo command.

sinfo output

sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  ncpu         up 1-00:00:00     22 drain* n[014-021,026-031,044-051]
  ncpu         up 1-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
  ncpu         up 1-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
  ncpu         up 1-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
  ngpu         up 2-00:00:00      4    mix n[141-143,148]
  ngpu         up 2-00:00:00      1  alloc n144
  ngpu         up 2-00:00:00      3   idle n[145-147]
  testing      up      30:00      2   idle login[01-02]
  gpu          up 2-00:00:00      4    mix n[141-143,148]
  gpu          up 2-00:00:00      1  alloc n144
  gpu          up 2-00:00:00      3   idle n[145-147]
  short*       up 1-00:00:00     22 drain* n[014-021,026-031,044-051]
  short*       up 1-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
  short*       up 1-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
  short*       up 1-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
  medium       up 2-00:00:00     22 drain* n[014-021,026-031,044-051]
  medium       up 2-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
  medium       up 2-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
  medium       up 2-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
  long         up 4-00:00:00     22 drain* n[014-021,026-031,044-051]
  long         up 4-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
  long         up 4-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
  long         up 4-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
The * in the partiton name indicates default partition. We see that all partitions are in different states - idle (up), alloc (allocated by user) or down. The information about each partition may be split over more than one line so that nodes in different states can be identified. The nodes in the marked by * in the STATE column indicate the nodes that are not responding.

The sinfo command has many options to easily let you view the information of interest to you in whatever format you prefer. See the sinfo documentation for more information or type sinfo --help.


squeue - information about submitted jobs

Next we determine what jobs exist on the system using the squeue command.

Viewing Existing Jobs

squeue
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  16048     short xgboost_    user1 PD       0:00      1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
  15739     short test3232    user2 PD       0:00      2 (Priority)
  15365     short   DHAI-b    user1 PD       0:00      1 (Priority)
  15349       gpu     gpu8     test  R       0:00      1 n141

Fields Explanation

  • JOBID: The JOBID field shows information about JOB ID in SLURM. You can work with this number in your SLURM scripts via SLURM_JOB_ID variable.
  • PARTITION: The PARTITION field shows on which partition the job is running.
  • NAME: The NAME field shows specified name of the job by user.
  • USER: The USER field shows the account username of person who has submitted the job.
  • ST: The ST field shows information about job state.
  • TIME: The TIME field shows how long the jobs have run for using the format days:hours:minutes:seconds
  • NODES: The NODES field shows the number of allocated nodes.
  • NODELIST(REASON): The NODELIST(REASON) field indicates where the job is running or the reason it is still pending. Typical reasons for pending jobs are Resources (waiting for resources to become available) and Priority (queued behind a higher priority job).

The squeue command has many options to easily let you view the information of interest to you in whatever format you prefer. The most common options include viewing jobs of a specific user (-u) and/or jobs running on a specific node (-w).

Viewing Jobs for a Given User/Node

squeue -u user1
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  120387      long   long_job   user_1  R    1:14:18      1 n008
  120396     short  short_job   user_1  R       0:34      2 n[024-025]

squeue -w n001
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  107491         long jif3d_mt   user_2  R 3-13:28:20      1 n001
  108441_76      long long_job   user_1  R 2-06:50:21      1 n001
  108441_82      long long_job   user_1  R 2-06:50:21      1 n001
  120398         short     test   user_3  R       1:36     1 n001
  120379         short     test   user_3  R       5:23     1 n001
  120333         short     test   user_3  R      13:39     1 n001
  120318         short     test   user_3  R      18:00     1 n001
  120272         short     test   user_3  R      28:04     1 n001
  120242         short     test   user_3  R      35:13     1 n001

See the squeue page for more information or type squeue --help.


srun - run parallel jobs

It is possible to create a resource allocation and launch the tasks for a job step in a single command line using the srun command. Depending upon the MPI implementation used, MPI jobs may also be launched in this manner. In this example we execute /bin/hostname on four nodes (-N 4) and include task numbers on the output (-l). For example, if you specify -partition=short and --time=01:00:00, you’ll get an error because the time you’ve specified exceeds the limit for that partition.

srun --pty /bin/bash

This way you can tailor your request to fit both the needs of you job and the limits of the partitions.

srun --partition=short --export=ALL --nodes=1 --ntasks=8 --cpus-per-task=4 --mem=128G --time=02:00:00 /bin/bash
srun --partition=gpus --export=ALL --nodes=1 --ntasks=16 --gres=gpu:1 --cpus-per-task=1 --mem=64G --time=02:00:00 /bin/bash

See the man page for more information or type srun --help.


sbatch - submit parallel jobs

More common mode of operation is to submit a script for later execution with sbatch command. In this example the script sbatch_submit.sh is submitted to nodes n067 and n066 (--nodelist “n[066-067]”, note the use of a node range expression), in which the subsequent job steps will spawn four tasks with 4 cpus each. The output will appear in the file stdout.<SLURM_JOBID.out (“--output stdout.%J.out”). This script contains a timelimit for the job embedded within itself.

cat sbatch_submit.sh
  #!/bin/bash
  #SBATCH --account=<project_name>
  #SBATCH --partition=short
  #SBATCH --time=01:00:00
  #SBATCH --nodes=2
  #SBATCH --ntasks-per-node=4
  #SBATCH --cpus-per-task=4
  #SBATCH --mem=64G
  #SBATCH --nodelist=n[066-067]
  #SBATCH --output stdout.%J.out 
  #SBATCH --error stderr.%J.out 


  ## End of sbatch section
  ## Commands to be executed during the run of the script

sbatch sbatch_submit.sh
  Submitted batch job 38793

Other options can be supplied as desired by using a prefix of “#SBATCH” followed by the option at the beginning of the script (before any commands to be executed in the script).

Alternatively, options can be provided to sbatch on the command line:

Submitting jobs with sbatch

cat sbatch_submit.sh
  #!/bin/bash
  #SBATCH --account=<project_name>
  #SBATCH --partition=short

  ## End of sbatch section
  ## Commands to be executed during the run of the script

sbatch --nodes 2 --nodelist "n[066-067]" --ntasks-per-node=4 --cpus-per-task=4 --mem=64G --output --output stdout.%J.out --error --output stderr.%J.out sbatch_submit.sh
  Submitted batch job 38794

Options supplied on the command line would override any options specified within the script.

See the man page for more information or type sbatch --help.


scancel - terminate running jobs

The command scancel is used to signal or cancel jobs, job arrays or job steps. A job or job step can only be signaled by the owner of that job or root. If an attempt is made by an unauthorized user to signal a job or job step, an error message will be printed and the job will not be terminated.

scancel --user <username>

Jobs can be generally cancelled using jobs name and/or its SLURM ID.

scancel --name "test_job"
#OR
scancel 666

scancel can be also used to cancel all your jobs in a specific element, i.e. state, partition...

scancel --state PENDING --user <username>

An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs. If the job ID of a job array is specified with an array ID value then only that job array element will be cancelled. If the job ID of a job array is specified without an array ID value then all job array elements will be cancelled. While a heterogeneous job is in a PENDING state, only the entire job can be cancelled rather than its individual components.

See the man page for more information or type scancel --help.


Other SLURM commands

sacct - job accounting information

The sacct command can be used to display status information about users historical jobs, based on users name and/or SLURM job ID. By defeault the sacct ill only bring up information about the user’s job from the current day.

sacct --jobs=<jobid> [--format=metric1,...]

Showing information about completed job

 sacct --jobs=<jobid>
   JobID            JobName  Partition    Account  AllocCPUS      State ExitCode
   ------------- ---------- ---------- ---------- ---------- ---------- --------
   <jobid>             name      short  <project>        512    TIMEOUT      0:0
   <jobid>.batch      batch             <project>         64  CANCELLED     0:15
   <jobid>.exte+     extern             <project>        512  COMPLETED      0:0
   <jobid>.0     hydra_bst+             <project>        512     FAILED      5:0

Use -X to aggregate the statistics relevant to the job allocation itself, not taking job steps into consideration.

Showing aggregated information about completed job

 sacct -X --jobs=<jobid>
   JobID            JobName  Partition    Account  AllocCPUS      State ExitCode
   ------------- ---------- ---------- ---------- ---------- ---------- --------
   <jobid>             name      short  <project>        512    TIMEOUT      0:0

By using the --starttime (-S) flag the command will look further back to the given date. This can also be combined with --endtime (-E) to limit the query:

sacct [-X] -u <user>  [-S YYYY-MM-DD] [-E YYYY-MM-DD] [--format=metric1,...] # Specify user and start and end dates

The --format flag can be used to choose the command output (full list of variables can be found with the --helpformat flag):

sacct [-X] -A <account> [--format=metric1,...] # Display information about account jobs

sacct format variable names
Variable Description
Account The account the job ran under.
AveCPU Average (system + user) CPU time of all tasks in job.
AveRSS Average resident set size of all tasks in job.
AveVMSize Average Virtual Memory size of all tasks in job.
CPUTime Formatted (Elapsed time * CPU) count used by a job or step.
Elapsed Jobs elapsed time formated as DD-HH:MM:SS.
ExitCode The exit code returned by the job script or salloc.
JobID The id of the Job.
JobName The name of the Job.
MaxRSS Maximum resident set size of all tasks in job.
MaxVMSize Maximum Virtual Memory size of all tasks in job.
MaxDiskRead Maximum number of bytes read by all tasks in the job.
MaxDiskWrite Maximum number of bytes written by all tasks in the job.
ReqCPUS Requested number of CPUs.
ReqMem Requested amount of memory.
ReqNodes Requested number of nodes.
NCPUS The number of CPUs used in a job.
NNodes The number of nodes used in a job.
User The username of the person who ran the job.

A full list of variables that specify data handled by sacct can be found with the --helpformat flag or by visiting the slurm documentation on sacct.


seff - job accounting information

This command can be used to find the job efficiency report for the jobs which are completed and exited from the queue. If you run this command while the job is still in the R(Running) state, this might report incorrect information.

The seff utility will help you track the CPU/Memory efficiency. The command is invoked as:

seff <jobid>

Jobs with different CPU/Memory efficiency
seff <jobid>
  Job ID: <jobid>
  User/Group: user1/group1
  State: COMPLETED (exit code 0)
  Nodes: 1
  Cores per node: 32
  CPU Utilized: 41-01:38:14
  CPU Efficiency: 99.64% of 41-05:09:44 core-walltime
  Job Wall-clock time: 1-11:19:38
  Memory Utilized: 2.73 GB
  Memory Efficiency: 2.13% of 128.00 GB
seff <jobid>
  Job ID: <jobid>
  User/Group: user1/group1
  State: COMPLETED (exit code 0)
  Nodes: 1
  Cores per node: 16
  CPU Utilized: 14:24:49
  CPU Efficiency: 23.72% of 2-12:46:24 core-walltime
  Job Wall-clock time: 03:47:54
  Memory Utilized: 193.04 GB
  Memory Efficiency: 75.41% of 256.00 GB
seff <jobid>
  Job ID: <jobid>
  User/Group: user1/group1
  State: COMPLETED (exit code 0)
  Nodes: 1
  Cores per node: 64
  CPU Utilized: 87-16:58:22
  CPU Efficiency: 86.58% of 101-07:16:16 core-walltime
  Job Wall-clock time: 1-13:59:19
  Memory Utilized: 212.39 GB
  Memory Efficiency: 82.96% of 256.00 TB

This illustrates a very bad job in terms of CPU/memory efficiency (below 4%), which illustrate a case where basically the user wasted 4 hours of computation while mobilizing a full node and its 64 cores.

seff <jobid>
  Job ID: <jobid>
  User/Group: user1/group1
  State: COMPLETED (exit code 0)
  Nodes: 1
  Cores per node: 64
  CPU Utilized: 00:08:33
  CPU Efficiency: 3.55% of 04:00:48 core-walltime
  Job Wall-clock time: 00:08:36
  Memory Utilized: 55.84 MB
  Memory Efficiency: 0.05% of 112.00 GB

seff <jobid>
  Job ID: <jobid>
  User/Group: user1/group1
  State: COMPLETED (exit code 0)
  Nodes: 1
  Cores per node: 64
  CPU Utilized: 34-17:07:26
  CPU Efficiency: 95.80% of 36-05:41:20 core-walltime
  Job Wall-clock time: 13:35:20
  Memory Utilized: 5.18 GB
  Memory Efficiency: 0.00% of 0.00 MB
  Nvidia SXM A100 40GB #1:
    GPU Efficiency: 36.90%
    Memory Utilized: 0.00 GB (0.00%)
  Nvidia SXM A100 40GB #2:
    GPU Efficiency: 36.82%
    Memory Utilized: 0.00 GB (0.00%)
  Nvidia SXM A100 40GB #3:
    GPU Efficiency: 36.92%
    Memory Utilized: 0.00 GB (0.00%)
  Nvidia SXM A100 40GB #4:
    GPU Efficiency: 36.74%
    Memory Utilized: 0.00 GB (0.00%)

projects - View Projects Information

This command displays information about projects available to a user and project details, such as available allocations, shared directories and members of the project team.

The sprojects script shows the available slurm account (projects) for the selected user ID. If no user is specified (with -u) the script will display the info for current user.

Show available accounts for the current user

sprojects 
   The following slurm accounts are available for user user1:
   p70-23-t

Option -a force the script to display just allocations (in corehours or GPU hours) as spent/awarded.

Show all available allocations for the current user

sprojects -a 
   +=================+=====================+
   |     Project     |     Allocations     |
   +-----------------+---------------------+
   | p70-23-t        | CPU:      10/50000  |
   |                 | GPU:       0/12500  |
   +=================+=====================+

With -f option the script will display more details (including available allocations).

Show full info for the current user

sprojects -f 
   +=================+=========================+============================+=====================+
   |     Project     |       Allocations       |      Shared storages       |    Project users    |
   +-----------------+-------------------------+----------------------------+---------------------+
   | p371-23-1       | CPU:    182223/500000   | /home/projects/p371-23-1   | user1               |
   |                 | GPU:       542/1250     | /scratch/p371-23-1         | user2               |
   |                 |                         |                            | user3               |
   +-----------------+-------------------------+----------------------------+---------------------+
   | p81-23-t        | CPU:     50006/50000    | /home/projects/p81-23-t    | user1               |
   |                 | GPU:       766/781      | /scratch/p81-23-t          | user2               |
   +-----------------+-------------------------+----------------------------+---------------------+
   | p70-23-t        | CPU:    485576/5000000  | /home/projects/p70-23-t    | user1               |
   |                 | GPU:       544/31250    | /scratch/p70-23-t          | user2               |
   |                 |                         |                            | user4               |
   |                 |                         |                            | user5               |
   |                 |                         |                            | user6               |
   |                 |                         |                            | user7               |
   +=================+=========================+============================+=====================+

sprio - jobs scheduling priority information

Demand for HPC resources typically surpasses supply, thus a method which establishes an order when a job can run has to be implemented. By default, the scheduler allocates on a simple FIFO approach. However the applications of rules and policies can change the priority of a job, which will be expressed as a number to the scheduler. The sprio command can be used to view the priorities (and their components) of waiting jobs.

Sorting all waitings jobs by their priority


sprio -S -y
   JOBID  PARTITION    PRIORITY      SITE        AGE  FAIRSHARE  PARTITION        QoS
   674582 short        1442165          0        439      96126     345600    1000000
   674520 medium       1427035          0       1724     252511     172800    1000000
   674521 medium       1427033          0       1722     252511     172800    1000000
   674522 medium       1427031          0       1720     252511     172800    1000000
   674502 long         1026833          0       2444      24390          0    1000000
   674528 long          512442          0       1682     510760          0          0
Zero value of QoS for job 674528 indicates that it has been submitted within project that exceeded its duration.

See the slurm documentation page for more information or type sprio --help.

sshare - list shares of associations

This command displays fairshare information based on the hierarchical account structure. In our case we will use it to determine the fairshare factor used in job priority calculation. Since the fairshare factor value depends on the account (AKA user project) as well, we have to define it as well.

In this case we know, that our user1 has access to the project called "p70-23-t". Therefore we can display the fairshare factor (shown here in the last column) as follows:


sshare -A p70-23-t 
  Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
  -------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
  p70-23-t                                 1    0.333333   122541631      0.364839            
  p70-23-t               user1             1    0.111111     4798585      0.039159   0.263158 

You can display all project accounts available to you using sprojects command.

See the slurm documentation for more information or type sshare --help.


salloc - allocate resources and spawn a shell

The salloc command serves to allocate resources (e.g. nodes), possibly with a set of constraints (e.g. number of processor per node) for later utilization. After submitting the salloc command the terminal will be blocked until the job gets granted. Then the session still persists on the login node. Only when using srun commands are executed on the requested compute node. The task send with srun can run immediately, since the resources are allocated already.

hostname
  login01.devana.local

salloc --nodes=1 --ntasks-per-node=4 --mem-per-cpu=2G --time=01:00:00
  salloc: Pending job allocation 63752579
  salloc: job 63752579 queued and waiting for resources
  salloc: job 63752579 has been allocated resources
  salloc: Granted job allocation 63752579

hostname
  login01.devana.local

srun hostname
  n007

salloc starts shell on login node, not on the allocated node.

See the man page for more information or type salloc --help.


sattach - signal and attach to running jobs

The sattach command allows you to connect the standard input, output, and error streams to your current terminals ession.

sattach 12345.5
   [...output of your job...]
n007:~$ [Ctrl-C]
login01:~$

Press Ctrl-C to detach from the current session. Please note that you will have to give the job ID as well as step step ID. For most cases, simply append .0 to your job ID.

See the man page for more information or type sattach --help.


sbcast - transfer file to local disk on the node

Sometimes, it might be beneficial to copy the executable to a local path on the compute nodes allocated to the job, instead of loading it onto the compute nodes from a slow file system such as the home.

Users can copy the executable to the compute nodes before the actual computation using the sbcast command or the srun --bcast flag. Making the executable available local to the compute node, e.g. in /tmp could speed up the job startup time compared to running executables from a network file system.

sbcast exe_on_slow_fs /tmp/${USER}_exe_filename
srun /tmp/${USER}_exe_filename

File permissions

Make sure to choose a temporary file name unique to your computation (e.g. include your username with the variable $USER), or you may receive permission denied errors if trying to overwrite someone else's files.

There is no real downside to broadcasting the executable with Slurm, so it can be used with jobs at any scale, especially if you experience timeout errors associated with MPI_Init(). Besides the executable, you can also sbcast other large files, such as input files, shared libraries, etc. It would be easier to create a tar file to sbcast, then untar on the compute nodes before the actual srun instead of sbcasting multiple individual files.

See the man page for more information or type sbcast --help.


sstat - display resources utilized by a job

The sstat command allows users to retrieve status information about currently running jobs, including details on CPU usage, task information, node information, memory usage (RSS), and virtual memory (VM).

To check job statistics, use:

sstat --jobs=<jobid>

Showing information about running job

sstat --jobs=<jobid>
  JobID         MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU   NTasks AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask  AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot

  152295.0          2884M           n143              0   2947336K    253704K       n143          0    253704K       11         n143              0         11   00:06:04       n143          0   00:06:04        1     10.35M       Unknown       Unknown       Unknown              0     29006427            n143               0     29006427     11096661             n143                0     11096661 cpu=00:06:04,+ cpu=00:06:04,+ cpu=n143,energy=n+ cpu=00:00:00,fs/d+ cpu=00:06:04,+ cpu=n143,energy=n+ cpu=00:00:00,fs/d+ cpu=00:06:04,+ energy=0,fs/di+ energy=0,fs/di+ energy=n143,fs/dis+           fs/disk=0 energy=0,fs/di+ energy=n143,fs/dis+           fs/disk=0 energy=0,fs/di+

By default, sstat provides extensive output. To customize the displayed metrics, use the --format flag. An example of some these variables is listed in the table below:

Showing formatted information about running job

sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 152295
  JobID          NTasks             Nodelist     MaxRSS  MaxVMSize     AveRSS  AveVMSize
  ------------ -------- -------------------- ---------- ---------- ---------- ----------
  152295.0            1                 n143 183574492K 247315988K    118664K    696216K

If you do not run any srun commands, you will not create any job steps and metrics will not be available for your job. Your batch scripts should follow this format:

#!/bin/bash
#SBATCH ...
#SBATCH ...

module load ... # Set up environment

# launch job steps
srun <command to run> # that would be step 1
srun <command to run> # that would be step 2

The main metrics code you may be interested to review are listed below.

Variable Description
avecpu Average CPU time of all tasks in job.
averss Average resident set size of all tasks.
avevmsize Average virtual memory of all tasks in a job.
jobid The id of the Job.
maxrss Maximum number of bytes read by all tasks in the job.
maxvsize Maximum number of bytes written by all tasks in the job.
ntasks Number of tasks in a job.

See the sstat documentation for more information or type sstat --help.


scontrol - administrative tool

The scontrol command can be used to report more detailed information about nodes, partitions, jobs, job steps, and configuration. It can also be used by system administrators to make configuration changes. A couple of examples are shown below.

Long partition information
login01:~$ scontrol show partitions long
  PartitionName=long
    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
    AllocNodes=ALL Default=NO QoS=N/A
    DefaultTime=4-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
    MaxNodes=1 MaxTime=4-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
    Nodes=n[001-140]
    PriorityJobFactor=0 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
    OverTimeLimit=NONE PreemptMode=OFF
    State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
    JobDefaults=(null)
    DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
    TRES=cpu=8960,mem=35000G,node=140,billing=8960
    TRESBillingWeights=CPU=1.0,Mem=0.256G
Node information
login01:~$ scontrol show node n148
  NodeName=n148 Arch=x86_64 CoresPerSocket=32 
    CPUAlloc=1 CPUEfctv=64 CPUTot=64 CPULoad=1.04
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:A100-SXM4-40GB:4
NodeAddr=n148 NodeHostName=n148 Version=22.05.7
OS=Linux 3.10.0-1160.71.1.el7.x86_64 #1 SMP Tue Jun 28 15:37:28 UTC 2022 
RealMemory=256000 AllocMem=64000 FreeMem=67242 Sockets=2 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=gpu 
BootTime=2023-09-06T10:29:48 SlurmdStartTime=2023-09-18T14:25:33
LastBusyTime=2023-09-18T14:02:52
CfgTRES=cpu=64,mem=250G,billing=64,gres/gpu=4
AllocTRES=cpu=1,mem=62.50G,gres/gpu=1
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

See the man page for more information or type scontrol --help.