Now on DEIXIS online: Multitalented Metric

Wednesday, May 18, 2016

Deixis online: Multitalented Metric

Winning a marathon requires athletic ability. The same applies to a triathlon, but that event also rewards versatility.

Since the early 1990s, the supercomputing industry’s standard performance metric has been akin to tracking a marathoner’s performance. Used in the TOP500 computing systems rankings, that standard – the High Performance Linpack (HPL) – tallies floating-point operations per second (flops), or how fast a computer can use a particular method to solve millions of equations and report the results.

In recent years, many high-performance computing (HPC) researchers have become concerned that HPL’s design overwhelmingly favors computers that can do lots of floating-point operations. In fact, some computer builders may overprovision their machines with processors or create an architecture to get a good HPL performance rating without necessarily having the memory system performance to support other kinds of computation.

Computer systems must orchestrate data movement from the system’s memory to the processor and back to address a different, broad set of science and engineering applications, including modeling and simulating such things as automobile crashes, aerodynamics and oil recovery. These efforts all are based on solutions of often massive numbers of differential equations.

In fewer than three years, a new computing metric has emerged to broaden the performance rubric and capture these nuances.

The HPL metric doesn’t heavily stress data movement and memory system performance, says Michael Heroux of Sandia National Laboratories’ Center for Computing Research. Heroux, working with Jack Dongarra (an original Linpack author) and Piotr Luszczek from the University of Tennessee, has developed a supercomputing benchmark based on a teaching code he developed about 10 years ago for his students at St. John’s University in Minnesota. The High Performance Conjugate Gradients benchmark (HPCG) executes an algorithm distinct from the one HPL uses. Heroux wrote the code to show students how parallel programming divides problems and distributes the parts to individual processors, reducing the time to solution. It evolved to become a proxy test code for much larger programs and eventually became a novel benchmark that has since run on 63 HPC systems worldwide.

“If you have two computers that both can compute at a 10 petaflops per second rate,” or 10 quadrillion flops per second, Heroux says, “and one had a memory system that was twice as fast as the other one, HPL wouldn’t necessarily show that better performance because it doesn’t really concern itself with memory system performance.

“But HPCG would show that the machine with twice the memory system would also be twice as fast. HPCG is also sensitive to emerging types of concurrency on modern parallel processors. We’re running different kinds of computations, ones that are more sensitive to how the memory system performs. HPCG gives a nod to the tri-athlete over the marathon runner.”

Read more at DEIXIS: Computational Science at the National Laboratories, the online companion to the DOE CSGF’s annual print journal.

Image caption: Sandia National Laboratories’ (SNL) Sky Bridge, a Cray CS300-LC supercomputer that was ranked No. 161 on the latest Top500 list. (Photo: SNL.)