Skip to content

Compute Nodes

Perun is a cluster of compute nodes named n[001-136]. The nodes are equipped with x86-64 architecture AMD EPYC processors (universal and cloud partitions) and NVIDIA Grace Hopper Superchips (accelerated partition). The system is built on the BullSequana XH3000 architecture and contains three types of compute nodes.

Nodes Moda / Cnt Cores Memory Diskb GPUs Network
cn[001-045] U / 45 320 1152 GB 3.8 TB none NDR200
cn[046-060] C / 15 320 2304 GB 7.6 TB none NDR200
gn[001-076] A / 76 72 480 GB LPDDR + 384 GB HBM 3.8 TB 4× GH200 4× NDR400

a U - universal module, C - cloud module, A - accelerated module.

b The value represents raw local NVMe capacity.

Universal module nodes

Fig.1: Top and bottom back view of the BullSequana XH3000 blade.

The universal partition consists of 45 compute nodes implemented as BullSequana XH3420 blades. Each blade is a 1U hot-plug module hosting three independent dual-socket nodes. The nodes are based on AMD EPYC Turin 9845 processors (Zen 5 / Zen 5c architecture) using the SP5 platform with 12 DDR5 memory channels per socket. Cooling is provided by direct liquid cooling (DLC) using cold-plate technology.

Feature cn[001-045]
Processor 2× AMD EPYC Turin 9845 (160 cores per CPU)
Architecture Zen 5 / Zen 5c, SP5
Memory 1152 GB DDR5 @ 5600 MT/s
Local Storage 3.8 TB NVMe
Interconnect 200 Gb/s InfiniBand NDR

Cloud module nodes

The cloud partition consists of 15 nodes based on the same BullSequana XH3420 hardware platform and AMD EPYC Turin 9845 processors as the universal nodes. The primary difference is increased memory capacity and local storage, making it ideal for cloud infrastructure.

Feature cn[046-060]
Processor 2× AMD EPYC Turin 9845 (160 cores per CPU)
Architecture Zen 5c
Memory 2304 GB DDR5 @ 5600 MT/s
Local Storage 7.6 TB NVMe
Interconnect 200 Gb/s InfiniBand NDR

Accelerated module nodes

Fig. 2: NVIDIA Grace architecture.

The accelerated partition consists of 76 nodes implemented as BullSequana XH3515-H GPU blades. Each node integrates four NVIDIA Grace Hopper GH200 Superchips, combining Arm Neoverse V2 CPUs with NVIDIA Hopper GPUs and HBM3 memory. CPU–GPU communication is provided via NVLink-C2C, while GPU-to-GPU communication uses NVLink.
Feature gn[001-076]
Superchips 4× NVIDIA Grace Hopper GH200
CPU 4× 72-core Arm Neoverse V2 (Grace)
GPU 4× Hopper GPU, 96 GB HBM3 each
Memory 480 GB LPDDR5X (CPU) + 384 GB HBM3 (GPU)
Local Storage 3.8 TB NVMe
Interconnect 4× 400 Gb/s InfiniBand NDR

Processors

AMD Turin 9845

AMD EPYC Turin 9845 is a 64-bit x86 server microprocessor from the 5th generation EPYC family (codename Turin) optimized for highly parallel HPC workloads. Each EPYC 9845 processor provides 160 CPU cores and supports 12 channels of DDR5 memory with speeds up to 6000 MT/s. The platform supports PCIe 5.0 connectivity and CXL 2.0 for memory and accelerator expansion. The processor implements full-width 512-bit AVX-512 vector execution units and includes large shared L3 cache per CCX.

Fig. 3: AMD Turin architecture.

AMD EPYC Turin 9845
Architecture x86-64 (Zen 5 / Zen 5c) Process 4 nm (TSMC)
Cores 160 Threads 320 (SMT2)
Base Frequency 2.10 GHz Max Boost Frequency up to 3.70 GHz
Memory Channels 9 TB @ DDR5-6000a TDP 400 W

a Maximum capacity depends on DIMM type (e.g., 3DS DIMMs).

Arm Neoverse V2

NVIDIA Grace Hopper Superchip

Fig.4: AMD EPYC Turin 9845 architecture.

NVIDIA Grace is a 64-bit Arm-based server processor designed for tightly coupled HPC and AI systems. It is built around the Arm Neoverse V2 microarchitecture implementing the Armv9.0-A instruction set and is integrated within the Grace Hopper GH200 Superchip.

Each Grace CPU contains 72 high-performance Arm cores and is paired with LPDDR5X memory delivering up to 546 GB/s memory bandwidth per processor. The architecture supports Scalable Vector Extension v2 (SVE2) and Advanced SIMD (NEON), enabling efficient vectorized scientific computing.

Grace implements Large System Extensions (LSE) for efficient atomic operations and supports hardware virtualization, memory encryption, and advanced reliability features. Within the GH200 Superchip, the Grace CPU is coherently connected to the Hopper GPU via NVLink-C2C with up to 900 GB/s bidirectional bandwidth, enabling unified high-bandwidth CPUGPU memory access.

The processor is optimized for hybrid CPUGPU workloads, AI training and inference, and large-scale HPC applications requiring high memory throughput and strong energy efficiency.

Arm Neoverse V2
Architecture Armv9.0-A Process 4 nm (TSMC)
Cores 72 Threads 72 (no SMT)
Base Frequency 2.60 GHz Max Boost Frequency 3.50 GHz
Memory Channels 456 GB @ LPDDR5Xa TDP 500 W

a Maximum capacity depends on DIMM type.

Accelerators

NVIDIA Hopper GPU (GH200)

The NVIDIA Hopper GPU used in the accelerated partition is part of the Grace Hopper GH200 Superchip. Hopper is the ninth generation NVIDIA data-center GPU architecture and is designed for large-scale HPC and AI workloads requiring extreme floating-point throughput and memory bandwidth.

The GPU integrates high-performance Streaming Multiprocessors (SMs) with fourth-generation Tensor Cores supporting a broad range of precisions, including FP64, FP32, TF32, FP16, BF16, FP8, and INT8. This allows efficient execution of both traditional double-precision scientific simulations and modern mixed-precision AI workloads.

Each Hopper GPU in the GH200 configuration is equipped with 96 GB of HBM3 memory providing up to approximately 4 TB/s of memory bandwidth. The memory subsystem supports SECDED ECC protection and includes row remapping mechanisms for improved reliability.

Hopper introduces several architectural enhancements compared to the previous Ampere generation:

Fig. 5: Grace Hopper interconnection.

  • Increased FP64 and FP32 throughput
  • Fourth-generation Tensor Cores
  • DPX instructions accelerating dynamic programming algorithms
  • 50 MB L2 cache for improved data locality
  • Asynchronous execution engines and Tensor Memory Accelerator (TMA)
  • Second-generation Multi-Instance GPU (MIG)

Within a 4-way GH200 node, Hopper GPUs are interconnected using high-bandwidth NVLink peer-to-peer links. Each GPU can communicate directly with its peers, enabling high aggregate bandwidth for multi-GPU parallel workloads.

The Hopper GPU is optimized for:

  • Large-scale AI model training and inference
  • Hybrid CPUGPU scientific workflows
  • Data-intensive workloads requiring extreme memory bandwidth

In the Grace Hopper configuration, the GPU is coherently connected to the Grace CPU via NVLink-C2C, enabling unified memory semantics and high-bandwidth CPUGPU data exchange far beyond traditional PCIe-based accelerator systems.

NVIDIA Hopper GPU (GH200)
Architecture NVIDIA Hopper GPU Generation 9th (Data Center)
CUDA Cores 16,896a Tensor Cores 4th Generation
FP64 Performance up to ~34 TFLOP/sa FP32 Performance up to ~67 TFLOP/sa
GPU Memory 96 GB HBM3 Memory Bandwidth up to ~3.6–4.0 TB/s
NVLink (GPU-GPU) up to 900 GB/s aggregate TDP 1000 W

a Exact values depend on specific Hopper SKU configuration within GH200.

Created by: Andrej Sec