Performance Comparison of HPX vs. MPI+OpenMP for the Discontinuous Galerkin Finite Element Method on Knights Landing Chips

Maximilian Bremer, University of Texas

Photo of Maximilian Bremer

Increasing vector widths and many-core architectures introduce significant challenges to achieving efficient compute resource utilization on next-generation supercomputers. High Performance ParallelX (HPX) is an asynchronous runtime specifically designed to address the bottlenecks associated with the massive concurrency of these upcoming systems. We present a comparison of a traditional MPI+OpenMP vs. an HPX implementation of a discontinuous Galerkin kernel solving the acoustic wave equation. In order to achieve good vectorization of the discontinuous Galerkin kernels, we will use Vc, a portable library of SIMD vector classes for C++. Scaling results will be presented on the Intel Knights Landing chips on Stampede2. We intend to present performance results highlighting the benefits of asynchronous task execution versus a static execution model.

Abstract Author(s): Max Bremer, Craig Michoski, Zach Byerly, Hartmut Kaiser, Clint Dawson