This article is part of our feature story: How CERN manages its data
Within the past three years, the volume of data collected from the LHC alone reached 75 PB, bringing the overall volume of data generated by CERN to more than 100 PB. At this level, storage is an experiment in itself. The strategy for CERN’s IT directors has been to characterize the information according to its probability of access. Thus, 88 PB have been archived on tape via the CERN Advanced Storage system (CASTOR), while about 13 PB have been stored on a disc system (EOS) optimized for rapid and simultaneous analysis by multiple users.
It is when addressing the logistical aspect that these orders of magnitude take on their full meaning. CERN has deployed eight robotic band libraries, spread over two buildings. Each of them contains approximately 14,000 cartridges with a unit capacity of 1 to 5.5 TB. EOS itself consists of 17,000 discs connected to 800 servers. A unified namespace supports concurrent accesses to the millions of files.
The IT team is taking advantage of the first LHC long shutdown (aka “LS1”) to analyze the health of its data treasury and conduct a number of consolidation and maintenance operations. For example, the tapes will be replicated on cartridges offering larger individual capacity. At the same time, engineers must prepare for the arrival of new data flows coming from updated accelerators. For this, a new remote datacenter is being created in Budapest (Hungary) , with a cluster featuring no less than 20,000 cores and a storage capacity of 5.5 PB. It is currently being interconnected with Geneva via a dedicated 200-Gbps network.
More around this topic...
© HPC Today 2024 - All rights reserved.
Thank you for reading HPC Today.