SSDs for Big Data: Fast Processing Requires High-Performance Storage
By   |  October 14, 2015

Choosing the Right SSD in Big Data Deployments
SSDs in general are rated for 1 or 2 million device hours (MTTF), which translates to at least a century or two of operation. NAND Flash cells only wear out if they are being written to. Enterprise-class SSDs are designed for high reliability, maximum durability, and fast, consistent performance. Enterprise-class SSDs last 10 to 1000 times longer than personal storage SSDs under write workloads. While Flash memory performance tends to degrade with use, enterprise SSDs maintain performance over time. Write performance is 2 to 12 times better, and read performance is comparable to or better than personal storage SSDs. The price per gigabyte is 2 to 30 times more for enterprise-class SSDs.

Big data applications in large corporate data centers, like scientific computing and business analytics applications, are often characterized by mixed read/write workloads that require very low latency and massive IOPS—a good match for durable, robust enterprise-class SSDs.

Personal storage SSDs are designed for good read performance and tailored reliability and durability. They are optimized for workloads where reads are more frequent than writes. Personal storage SSDs offer high capacity and lower price per gigabyte than enterprise-class SSDs.

Web 2.0 public cloud applications like social networking sites are characterized by users uploading images, video, and audio files, which are subsequently downloaded or streamed by other users. This type of write-once, read-many-times workload is a good candidate for personal storage SSDs.

Application Considerations for Enterprise vs. Personal Storage SSDs
Not all big data deployments are the same, and not all SSDs are the same. The question is how to match the right SSD to the right big data deployment. Choosing an SSD solution is based primarily on the performance and availability requirements of the application. The decision tree in Figure 1 and the following Q&A will help you choose the optimal SSD solution for your application.

Question #1: What is the IOPS performance requirement for your application, including the read/write mix?
The first step is to quantify the workload that SSDs will support. An application workload can be measured using a variety of performance monitoring tools. Beyond workload, also consider the configuration of the system and the impact on the overall platform.

Question #2: What are the endurance requirements for your application?
For mixed read/write workloads, it is important to look closely at SSD durability ratings. This is usually expressed in terms of total bytes written (TBW) or full drive writes per day over a 5-year period. By comparing an application’s daily write total with the durability rating of an SSD, it is possible to estimate the drive’s lifetime in your environment (assuming a constant workload; it might be wise to also estimate future growth). If the write workload is small enough that the estimated lifetime of a personal storage SSD will at least equal the 3 to 5 years typically expected of an IT system, and performance is sufficient, then personal storage SSDs can be a good choice. However, if personal storage SSDs will likely wear out and need to be replaced during the IT system’s lifetime, then replacement costs should be considered.

Question #3: What is the SSD total cost of ownership (TCO) over 3 to 5 years?
The SSD TCO over the system lifetime includes:

  • Cost of Drives – Determine how many drives will need to be purchased during a 3- to 5-year period, including replacements due to wearout, and multiply this figure by the acquisition cost.
  • Cost of Application Downtime – If the application needs to be taken offline to replace an SSD, what is the cost for that lost productivity? Multiply this figure by the number of replacements.
  • Cost of Slower Application Performance – If the application does not have to go offline for drive replacements, but system performance will slow during the replacement and subsequent data replication or RAID rebuild, how will this affect user productivity? Multiply this cost by the number of replacements.
  • Cost of Labor for Drive Replacement – Drive monitoring and replacement will be an additional management task for the IT staff, so the cost of labor should be included.
  • Risk of Data Loss – For unprotected drives, there is a significant risk of data loss, and even for RAID-protected drives, there is a small risk during the RAID rebuild window. Though difficult to quantify, these risks should be factored into the cost.

Conclusion
SSDs are a popular solution for big data applications. Deciding between personal storage and enterprise-class SSDs will depend on performance and endurance requirements and TCO.

About the author
Micron Technology, Inc., is a global leader in advanced semiconductor systems. Micron’s broad portfolio of high-performance memory technologies—including DRAM, NAND and NOR Flash—is the basis for solid state drives, modules, multichip packages and other system solutions.

Navigation

<12>

© HPC Today 2024 - All rights reserved.

Thank you for reading HPC Today.

Express poll

Do you use multi-screen
visualization technologies?

Industry news

Brands / Products index