RSC Group, the Russian HPC vendor, chose the SC16 international supercomputing conference to demonstrate a new generation of its high performance, scalable and energy-efficient RSC Tornado solution with direct liquid cooling based on the 72-core Intel Xeon Phi 7290. This system has set a new world record of computing density for x86 architecture of 1.41 Pflops per rack increasing the previous record (another RSC’s solution) by 17%.
By doing so, the RSC Tornado solution has improved its footprint and computing density, energy efficiency and management while maintaining stable operation of computing nodes.
The system is based on Intel Xeon Phi 7290 multi-core processor on an Intel S7200AP server board with two Intel SSD DC S3500 Series M.2 340 GB and one Intel SSD DC P3100 (M.2 NVMe) solid state drives (PCIe interface) and was demonstrated at the RSC booth.
RSC also demonstrated a storage solution using the latest NVMe-over-Fabric protocol features. An extension to the original NVMe specifications that allows remote access to NVMe SSDs connected over RDMA-enabled fabric. At its booth, the Russian company showed a basic infrastructure of NVMe-over-Fabric Target system with multiple Intel SSD DC P3700 disks and connected via NVMe-over-Fabric hosts (RSC Tornado nodes based on Intel Xeon Phi 7290 processors) over Intel Omni-Path fabric switch. This approach allows remotely address block devices with “close to local latency” which can be implemented in the I/O Node designs in HPC. To be noted, this is not a replacement to a traditional parallel storage – I/O nodes boost random I/O performance in the most effective way. An example can be partitioning a SSD into multiple partitions, sharing them to compute nodes when it’s needed for a workload. This can be used as a “scratch on demand” option available upon to request without reboot and re-configuration of a compute nodes.
RSC Tornado solution features
New generation of RSC Tornado cluster solution has the following improved characteristics:
- High-end models of multi-core Intel Xeon Phi 7200 processor family, including the Intel Xeon Phi 7290 (72-cores) and support of the upcoming Intel Xeon Phi 7250F, Intel Xeon Phi 7290F processors (with integrated high-speed interconnect Intel Omni-Path)
- Intel S7200AP server boards
- Highest physical density with up to 408 computing nodes in a dual-side 42U cabinet (120х120х200 cm)
- The computing density record – 1.41 Pflops (528 Teraflops in a previous generation) in a dual-side 42U cabinet or over 490 Teraflops/m3
- Better energy efficiency – the 200 kW power density per rack allows, thanks to the reduction of system consumption, to increase energy efficiency by 3 times
- Increased RAM volume usage per rack by 5 times from 16.5 TB to 76.5 TB (up to 192 GB RAM DDR4-2400 RAM and 16 GB MCDRAM per node)
- Simultaneous use of up to 2x SSD SATA drives and one PCIe SSD in M.2 form factor, such as Intel SSD DC S3500 series and Intel SSD DC P3100 (M.2 NVMe)
- Improved energy efficiency – provides necessary conditions for stable operation of computing nodes in “hot water” mode at +63 °С temperature at node input enabling system free-cooling operation in 24x7x365 mode with outstanding PUE of 1.05 and even less
- New power supply module in computing node form factor providing efficient transformation of 220V AC to 400V DC (with 96% efficiency) and supporting parallel operation on common bus with reservation scheme from N+1 to N+N
- Updated design of computing cabinet with support of new high speed inter-node communication technologies including Intel Omni-Path and Mellanox EDR InfiniBand
- Support of flexible cooling system configurations with redundancy of both single hydraulic regulation nodes and the entire system
- RSC Tornado nodes are serviceable separately without stopping any other node. All node components (memory, disks, high-speed interconnect adapters, power and management subsystems) are easily accessible for simple replacement or re-configuration of these components directly at the customer’s site
The innovative management and monitoring system of RSC solutions for high performance computing also provides high availability, resistance to failures and ease of use. It can be used to manage single nodes and the entire solution, including infrastructure components. All elements of the system (computing nodes, power supplies, hydraulic regulation modules, etc.) have an integrated management module, providing broad capabilities for detailed telemetry and flexible management. Cabinet design supports replacement of computing nodes, power supplies and hydraulic regulation modules (with redundancy) in hot-swap mode, without interruption of system operation. Most components of the system (such as computing nodes, power supplies, network and infrastructure components, etc.) are software-defined, and this significantly simplifies and speeds up initial deployment, maintenance and future upgrades of the system. Liquid cooling of all components ensures their longevity.
The innovative approaches in the latest generation of RSC Tornado cluster solution enables the reduction of infrastructure costs within the scope of computing system development and provides capabilities for more flexible upgrades of single nodes and entire systems.
© HPC Today 2023 - All rights reserved.
Thank you for reading HPC Today.