Concurrent-AMAT: a mathematical model for Big Data access
By   |  May 12, 2014

We have been interested in memory performance since 1990 and in memory data access patterns since 2004. Ten years rub a sword; we finally have achieved our goal in producing the APC (Access Per Cycle) [6] performance metric, and the C-AMAT (Concurrent-AMAT) [1] performance model. APC is a measurement for modern memory systems. It extends the measurement to memory concurrency. C-AMAT is a performance analysis tool for modern memory systems, with an elegant mathematical expression which is derived via rigorous mathematical proof. C-AMAT is a superset of AMAT, in the sense that it extends AMAT to take both memory locality and memory concurrency under consideration.

C-AMAT consists of five parameters, for describing the degree of hit time, hit concurrency, pure miss concurrency, pure miss rate, and average pure miss penalty. With a minimum number of parameters, C-AMAT captures both the memory access locality and concurrency, and their interaction. It characterizes the performance information which AMAT and MLP want to express, and unifies all the information under one mathematic formulation. Our results show that both memory stall time and CPU performance can be expressed explicitly in terms of C-AMAT [7]. Therefore, improving the performance of a computing system depends on the reduction of C-AMAT. This is especially true for data-intensive applications. Four of the C-AMAT’s five parameters are new. Through these new parameters, C-AMAT provides four new directions to improve memory performance. All of the five parameters are unified under C-AMAT. Through this unification, C-AMAT provides a tool to find an appropriate combination of these parameters for best performance. Our recent results show C-AMAT can be improved more than six times with optimized concurrence parameters; with an optimized combination of the five parameters, memory performance can be improved more than 200 times under current computing environments, compared with AMAT measurement. If system hardware and software can be adjusted and enhanced accordingly, the room of improvement can even be much larger.

Eq. 1 is the conventional AMAT formula, where H is the hit time of memory accesses; MR is the miss rate of cache accesses; and AMP is the average miss penalty. AMP is calculated as the sum of all single miss access latency divided by the total number of miss accesses.

Eq. 1 – \(AMAT = H + MR \times AMP\)

Eq. 2 – \(C{-}AMAT = \frac{H}{C_{H}} + pMR \times \frac{pAMP}{C_{M}}\)

Eq. 2 is the C-AMAT formula. The first parameter, H, has the same meaning as in AMAT. The second parameter, CH, represents the hit concurrency; parameter CM represents the pure miss concurrency. CH could be contributed by multi-port cache, multi-banked cache or pipelined cache structures; and CM could be contributed by non-blocking cache structures or prefetching logic. In addition, processor ILP design techniques, such as out-of-order execution, multiple issue pipeline, SMT, CMP, etc., can both increase the hit concurrency and miss concurrency. The pure Miss Rate (pMR) is the number of pure misses over the total number of accesses. Here, “pure miss” means the miss contains at least one miss cycle which does not have any hit access activity. When measuring private caches for CMP processors, e.g., L1 data cache, the pure misses are measured based on “per-core” mode, which means every core has its own detecting logic, and that logic only measures the connected core’s private cache accesses. When a miss occurs without a hit access inside the private cache, the correspondent cycle is measured as a “pure miss cycle” for that core. For last level shared caches, the pure miss cycles are measured based on “all-core” mode, which means when there is no cache hit access from any of the cores, then a miss cycle is counted as “pure miss cycle.” Pure average miss penalty (pAMP) is the average number of pure miss cycles per miss access. CM is the average pure miss concurrency.

Pure miss is a very important concept introduced by C-AMAT. The introduction of pure miss is based on the fact that not all the cache misses will cause processor stall. Only pure misses cause processor stall. Pure miss is the interaction of concurrency and locality. The concept of pure miss challenges the conventional computer hardware and software design principle of “locality is always good.” Pure miss and C-AMAT bring in a different angle of designing computer architecture and algorithm. They present a new paradigm for the design and development of the next generation of computers.

[References]

[6] D. Wang, X. H. Sun. APC: A Novel Memory Metric and Measurement Methodology for Modern Memory Systems in IEEE Transactions on Computers, in press, IEEE TC digital print (DOI Bookmark).

[7] Y. H. Liu, X. H. Sun, Reevaluating Memory Stall Time via Concurrent AMAT in Illinois Institute of Technology Technical Report (IIT/CS-SCS-2013-12), 2013.

Navigation

<123>

© HPC Today 2024 - All rights reserved.

Thank you for reading HPC Today.

Express poll

Do you use multi-screen
visualization technologies?

Industry news

Brands / Products index