Non-Volatile Memory (NVM) technology for Exascale computing – A position paper
By   |  June 18, 2014

This paper examines the race to Exascale using the second hypothesis. That is, assuming an Exascale machine node with large NVM memory (fast, large and close to the CPU). Since application software moves much slower than hardware technology [8], we believe that readiness is even more crucial. The remainder of this document lists potential impacts on the software stack as well as application development.

Compute vs storage – It is expected that compute power will grow faster than storage capability for HPC machines. The availability of NVM may change this idea. However, as more memory is available inside the node, getting the data out may overwork the I/O system.

I/O stack – The anticipated speed of NVM is so fast that it requires revisiting the storage software stack [9]: current storage software’s latency and corresponding power consumption will become the main cost factors. In the end, NVM may either be devoted to RAM or to normal storage. New application development will define the expectations of this new hardware capability.

Resilience – Constraints on the implementation of resilience can be fully investigated. Memristor performance may be fast enough to allow saving the processor state so frequently that transient errors are completely handled at the system’s lowest level and transparent to users [6]. This means redirecting the research effort at higher-level functionalities, closer to the application.

Programming API – If NVM is to be used for application development, it must be available to programmers via an efficient API. There are numerous challenges for designing this API because the issues needing to be addressed include data persistence (e.g. pointer address insensitive), resource management (e.g. NVM allocation and I/O organization), energy management (e.g. since the data is persistent, certain parts of the system could be shut down while participating temporarily), data sharing between nodes (e.g. PGAS, peer-to-peer exchanges) and performance.

In-situ analysis, pre/post processing – NVM technology has the potential to turn compute-centric machines into data-centric systems [4]. As a consequence, the data analysis and processing tasks that were envisioned for a different kind of system (e.g. big data machine) can be efficiently implemented close to the CPU. This opens up many opportunities for developing new kinds of scientific applications that are more data-oriented. It should be noted that thanks to a photonic-based network, it is possible to aggregate NVM nodes to build a very large memory space dedicated to data mining and other analysis tasks. Visualizing data in this context is still needing clarification.

Code coupling – Code coupling may be made more efficient as well as simpler to implement. Current libraries for code coupling will need to be revised to take this new storage into account. Furthermore, the coupling “frequency” might also have to be reviewed at the numerical scheme level.

Compiler and runtime technology – In the hopes of energy management, heterogeneous hardware and system configuration compiler research has been studying auto-tuning and runtime libraries capable of auto-adapting to changing runtime conditions (e.g. some cores being inactive, differing data sizes, etc.). This technique normally relies on code versioning, specialization and run-time code generation. All of these require run-time performance analysis and code tuning during a discovery phase. Keeping local data on a node during application execution (thanks to persistence) would help to reduce the cost of this phase, and in many cases significantly impact performance.

Debugging, performance tools – Performance tools such as Tau, Vampire, Paraver and Scalasca are based on tracing events on each node. With the increase of parallel activity, storing the huge amount of events at Exascale level is a challenge. The ability to store and process locally, as well as keep performance history on each node, is likely to help in redesigning these tools to handle the massive parallelism (e.g. post-mortem analysis). Similar studies are also needed for designing next generation debuggers.

Conclusion

The question of systems with fast, large, cheap and close-to-CPU NVM does not seem to be “if” but rather “when”. This will not only have a huge impact on the HPC landscape, but will also unify the compute-centric tasks with the data-centric tasks of modern HPC applications which are currently confronting a deluge of data.

François Bodin is co-leader of the Operational software maturity level methodology Working Group at the European Exascale Software Initiative.

[References]

[8] François Bodin, Henri Calandra, Alain Refloc’h, Processor evolution: what to prepare application codes for? – HPC Magazine, Apr. 28, 2014.

[9] Steven Swanson and Adrian M. Caulfield – Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage, IEEE Computer, 2013.

[Additional references]

Adrian M. Caulfield, Joel Coburn, Todor Mollov, Arup De, Ameen Akel, Jiahua He, Arun Jagatheesan, Rajesh K. Gupta, Allan Snavely, and Steven Swanson, Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing – in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’10). IEEE Computer Society, Washington, DC, USA, 1-11. DOI=10.1109/SC.2010.56

Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson, NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories – in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS XVI). ACM, New York, NY, USA, 105-118. DOI=10.1145/1950365.1950380

Blackcomb: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems (ORNL, 2014).

The Non-Volatile System Laboratory at UCSD.

Navigation

<12>

© HPC Today 2024 - All rights reserved.

Thank you for reading HPC Today.

Express poll

Do you use multi-screen
visualization technologies?

Industry news

Brands / Products index