Minimizing Communication in Numerical Linear Algebra
University of California, Berkeley
The parallelization of numerical algorithms is constrained by overheads of communication and load imbalance, which are manifestations of the computation’s dependency structure and the parallel schedule itself. Therefore, accurate design of parallel algorithms requires modeling the communication and synchronization costs along any execution path in the schedule rather than solely considering the computational complexity. I will give a high-level introduction to a few parallel algorithmic techniques for avoiding communication and synchronization in dense numerical linear algebra and sparse iterative methods. Additionally, I will present lower bounds on communication costs, which demonstrate the optimality of the aforementioned parallelization strategies for computations with certain regular dependency structures. The theoretical analysis of these algorithms will be justified with performance results for dense matrix computations on massively parallel architectures. Lastly, I will overview Cyclops Tensor Framework, a library abstraction capable of mapping, decomposition, and redistribution of multidimensional data sets that facilitates communication-efficient algorithms in the context of scientific applications, in particular, high-accuracy electronic structure methods.