Supercomputer architecture
Configuration of Irene
The compute nodes are gathered in partitions according to their hardware characteristics (CPU architecture, amount of RAM, presence of GPU, etc). A partition is a set of identical nodes that can be targeted to host one or several jobs. Choosing the right partition for a job depends on code prerequisites in term of hardware resources. For example, executing a code designed to be GPU accelerated requires a partition with GPU nodes.
The Irene supercomputer offers three different kind of nodes: regular compute nodes, large memory nodes, and GPU nodes.
- Skylake nodes for regular computation
Partition name: skylake
CPU : 2x24-cores Intel Skylake@2.7GHz (AVX512)
Cores/Node: 48
Nodes: 1 653
Total cores: 79 344
RAM/Node: 180GB
RAM/Core: 3.75GB
- AMD Rome nodes for regular computation
Partition name : Rome
CPUs: 2x64 AMD Rome@2.6Ghz (AVX2)
Core/Node: 128
Nodes: 2286
Total core: 292 608
RAM/Node: 228GB
RAM/core : 1.8GB
- Hybrid nodes for GPU computing and graphical usage
Partition name: hybrid
CPUs: 2x24-cores Intel Skylake@2.7GHz (AVX2)
GPUs: 1x Nvidia Pascal P100
Cores/Node: 48
Nodes: 20
Total cores: 960
RAM/Node: 180GB
RAM/Core: 3.75GB
I/O: 1 HDD 250 GB + 1 SSD 800 GB/NVMe
- Fat nodes with a lot of shared memory for computation lasting a reasonable amount of time and using no more than one node
Partition name: xlarge
CPUs: 4x28-cores Intel Skylake@2.1GHz
GPUs: 1x Nvidia Pascal P100
Cores/Node: 112
Nodes: 5
Total cores: 560
RAM/Node: 3TB
RAM/Core: 27GB
IO: 2 HDD de 1 TB + 1 SSD 1600 GB/NVMe
- V100 nodes for GPU computing and AI
Partition name: V100
CPUs: 2x20-cores Intel Cascadelake@2.1GHz (AVX512)
GPUs: 4x Nvidia Tesla V100
Cores/Node: 40
Nodes: 32
Total cores: 1280 (+ 128 GPU)
RAM/Node: 175 GB
RAM/Core: 4.4 GB
- V100l nodes for GPU computing and AI
Partition name: V100
CPUs: 2x18-cores Intel Cascadelake@2.6GHz (AVX512)
GPUs: 1x Nvidia Tesla V100
Cores/Node: 36
Nodes: 30
Total cores: 1080 (+ 30 GPU)
RAM/Node: 355 GB
RAM/Core: 9.9 GB
- V100xl nodes for GPU computing and AI
Partition name: V100
CPUs: 4x18-cores Intel Cascadelake@2.6GHz (AVX512)
GPUs: 1x Nvidia Tesla V100
Cores/Node: 72
Nodes: 2
Total cores: 144 (+ 30 GPU)
RAM/Node: 2.9 TB
RAM/Core: 40 GB
Note that depending on the computing share owned by the partner you are attached to, you may not have access to all the partitions. You can check on which partition(s) your project has allocated hours thanks to the command ccc_myproject
.
ccc_mpinfo displays the available partitions/queues that can be used on a job.
$ ccc_mpinfo
--------------CPUS------------ -------------NODES------------
PARTITION STATUS TOTAL DOWN USED FREE TOTAL DOWN USED FREE MpC CpN SpN CpS TpC
--------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ----- --- --- --- ---
skylake up 9960 0 9773 187 249 0 248 1 4500 40 2 20 1
xlarge up 192 0 192 0 3 0 3 0 48000 64 4 16 1
hybrid up 140 0 56 84 5 0 2 3 8892 28 2 14 1
v100 up 120 0 0 120 3 0 0 3 9100 40 2 20 1
MpC : amount of memory per core
CpN : number of cores per node
SpN : number of sockets per node
Cps : number of cores per socket
TpC : number of threads per core This allows for SMT (Simultaneous Multithreading, as hyperthreading for Intel architecture)
Interconnect
The compute nodes are connected through a EDR InfiniBand network in a pruned FAT tree topology. This high throughput and low latency network is used for I/O and communications among nodes of the supercomputer.
Lustre
Lustre is a type of parallel distributed file system, commonly used for large-scale cluster computing. It actually relies on a set of multiple I/O servers and the Lustre software presents them as a single unified filesystem.
The major Lustre components are the MDS and OSSs. The MDS stores metadata such as file names, directories, access permissions, and file layout. It is not actually involved in any I/O operations. The actual data is stored on the OSSs. Note that one single file can be stored on several OSSs which is one of the benefits of Lustre when working with large files.

Lustre
More information on how Lustre works and best practices are described in Lustre best practice.