Controlling the Tide of Weather and Climate Data with Tiered Storage
By   |  May 17, 2016

About the author
Jeffrey Katcher is Business Development Manager in charge of solutions and Strategies at Cray, Inc.

There’s an old cliché that everyone talks about the weather, but no one does anything about it. While Cray can’t (yet) prevent droughts or cool off hot spells, we can help make the lives of weather professionals easier…

An abundance of data, but where to store it?
Weather and climate modeling centers strive to improve the accuracy of their models by gathering and assimilating more diverse input data and by increasing model resolution and complexity. As a result, these models are ingesting and producing ever-increasing volumes of data.

These weather and climate organizations often find themselves challenged by the sheer volume of data, trying to manage various ways it may be used and simply trying to find the resources, financial and otherwise, to store and access it over the long term. Many numerical weather prediction centers have tens to hundreds of petabytes of data, and are facing annual data growth rates in excess of 40%. This level of data growth makes a data archive solution a critical component for maintaining order and producing the reliable weather and climate forecasts the world needs.

Confronted with massive amounts of data, researchers need to easily archive it and then just as easily retrieve it for quicker model refinement and analysis. A cost-effective and easily manageable archive solution is demanded, one that keeps data readily accessible. Typical archiving solutions often fall short: proprietary hardware compromises data movement, scalability is inadequate for increasing data volumes, and integration into workflows requires expensive support services. Any one of these factors can obstruct a numerical weather prediction or climate research workflow.

Transparently blending fast disk and economical tape
Historically, weather centers have used a variety of storage technologies for online and archival storage. It often involved the manual construction of elaborate processes to copy data from one to the other and back, locking organizations into barely functional process and products for lack of better alternatives.

By thinking of storage, online and archival, as a unified whole, intelligence can be built in to keep the complexity under the surface and out of the way of users. This intelligence is provided by a hierarchical storage system (HSM), which blends fast disk and economical tape into a single transparent filesystem. Users can see and access all files in their huge libraries, but only the files in use are kept on fast, expensive disk. If a dataset isn’t used any more, this is detected, and it’s automatically migrated to a tape library for long-term safekeeping until it’s needed again.

Existing and often proprietary archive systems tend to lock data into internal formats, making it very difficult to retrieve in a future beyond the system’s lifetime. By comparison, open archival systems like Cray Tiered Adaptive Storage (TAS) store their archives in well-documented formats that should be easily retrievable in a future era.

Even with all the current hype about disk being dead, tape libraries still offer the lowest storage cost per terabyte and consume 10x less power than disk-based solutions. As data and the need to process it expands, the need for larger facilities with more power and cooling capacity is often overlooked. By keeping archives in compact, green tape libraries, at least some of this pressure can be mitigated.

Data residing on tape isn’t a solution by itself. In order to be effectively used, it must be accessible to the compute infrastructure. Scripting is often used to orchestrate large-scale copying from archive to online storage, but this is no longer sufficient. By using a modern open archiving solution like TAS, all files — archive and active — are presented to users as a single file system, leaving the magic behind the scenes. TAS also has the virtue of integration with high-performance Lustre storage, allowing transparent movement from the fastest to the slowest storage without manual intervention by a systems administrator.

By bringing all weather and climate data, including vast archives, under the management of an open archival system like Cray TAS, weather and climate centers can accomplish more research, refine more models and simplify operation of their facilities.

© HPC Today 2019 - All rights reserved.

Thank you for reading HPC Today.

Express poll

Do you use multi-screen
visualization technologies?

Industry news

Brands / Products index