Massachusetts Institute of Technology
Growing up with computer-scientist parents near Los Alamos National Laboratory (LANL), it might have seemed like Peter Ahrens was destined to enter a computing-related field.
But family wasn’t necessarily fate, the Department of Energy Computational Science Graduate Fellowship (DOE CSGF) recipient says. He considered mechanical engineering but made his own decision to pursue computing after participating the LANL-sponsored Supercomputing Challenge for middle- and high-school students. His team simulated flocking birds and foraging ants to solve scheduling problems. “That set me off down the computer science road,” says Ahrens, now a Massachusetts Institute of Technology doctoral candidate. The complexity of the field's problems “really just grabbed me. I wanted answers.
With advisor Saman Amarasinghe, Ahrens seeks ways to efficiently map scientific problems to today’s high-performance computers, which merge multiple processor types across thousands of nodes connected by intricate networks. He planned to build tools that let scientists incorporate domain knowledge into compilers – programs that convert code into machine-readable instructions – tailored to their fields. A process called autotuning would test approaches to find the best ones.
At MIT, however, Ahrens decided he first needed to resolve some underlying questions. To write a compiler, “you need to know what you want the compiler to output.” He learned that computer scientists often didn’t understand the best output very well. “I’m not sure we know exactly what should be on the other side of this automated system.”
Ahrens focuses on matrices – data captured in two-dimensional grids. They’re common in computing and much time is spent performing math on the information they contain. To quickly solve problems, computers use parallel processing, parceling out operations to multiple processors for simultaneous solution.
“Scientists and mathematicians often think of a matrix as a set of relationships between inputs and outputs,” such as those generated in simulations. In sparse matrices, most grid entries are zeroes – meaning an input and a corresponding output are unrelated. That’s common because simulation elements usually only interact with their neighbors.
Computer scientists want to avoid expending processor time and communication on calculating relationships between entries containing zeroes. “The challenge becomes mapping the elements we do need to process to computational resources,” Ahrens adds. It’s particularly important to group related nonzero components on the same processor to limit time-consuming inter-processor communication.
Ahrens describes one approach in a recent paper published online. To balance computing workload among processors and reduce communication, his algorithm splits matrices into contiguous partitions without reordering rows and columns within them, as some techniques do. “If we aren’t allowed to reorder the rows before we split them up, this more constrained problem becomes solvable.”
Ahrens has run his algorithms on the Cori supercomputer at the National Energy Research Scientific Computing Center. Using the same system scientists use lets him solve problems that are applicable to what they research, he says.
During his 2019 practicum at Sandia National Laboratories in New Mexico, Ahrens approached a related problem: partitioning sparse matrices for instruction-level parallel computing – reorganizing them into small data blocks that special vectorized processors can handle efficiently. If a vector processor calculates four numbers at once, for example, “you get four times the speed. The disadvantage is if your structures are irregular it’s difficult to cram all those disparate entries into the neatly packed blocks.”
With mathematician and computer scientist Erik Boman, Ahrens tested block partitioning approaches. They found that simultaneously optimizing contiguous partitions for rows and columns was too demanding but fixing either the rows or the columns let them produce the best partition for the unfixed dimension. “So if you fix the rows, you can find the best columns” partition and vice versa, “but you can’t find the best of both at once.” The code would alternate for optimal partitions. Ahrens and Boman outlined the method in a preprint paper this May.
The practicum introduced Ahrens to a new method for sparse matrix operations, but it affected his goals more deeply. “The topics I was introduced to are more academic. I realized, working on them, that that’s what I want in my career.” He’ll pursue a university post after graduation, probably in 2023.
Meanwhile, Ahrens will continue pursuing a major hobby: glassblowing. It’s a popular activity at MIT, where demand outstrips slots in beginner classes. Ahrens has become so accomplished that he sometimes teaches new gaffers. “It provides a really nice contrast to the more ‘in the clouds’ kind of work I do all day,” he says. “It’s nice to do something creative with my hands in the real world.”
Image caption: At left: the nonzero pattern of the TKK/smt sparse matrix, arising from a thermal stress analysis of a surface-mounted transistor. At right, the pattern of the same matrix after spectral reordering. The pattern resolution is 256 by 256. Credit: Peter Ahrens.