Contiguous Partitioning: Registers, Caches, and Distributed Memories

Peter Ahrens, Massachusetts Institute of Technology

Photo of Peter Ahrens

Consider the contiguous graph partitioning problem, in which one is asked to split the rows of a sparse matrix into contiguous parts without reordering. Contiguous partitioning is beneficial when adjacent rows have correlated sparse patterns. Such patterns may arise naturally or be induced by embedding techniques such as spectral reordering. Existing approaches use heuristics to determine which rows and columns should be grouped or require quadratic time. We propose efficient, optimal algorithms under a wide variety of cost functions. These cost functions may attempt to minimize the number of register blocks or cache misses in shared-memory settings or the amount of communication in distributed-memory settings. We provide a survey of how different algorithms apply to different classes of cost functions and show that most of our algorithms run in linear or near-linear time. While simultaneously partitioning both the rows and columns is often NP-Hard, our algorithms focus on partitioning just the rows, given a fixed column partition. In the register-blocking setting, we give a linear-time algorithm to optimize cost functions like the number of blocks, the memory usage, or an empirical performance model. In the cache-blocking setting, we propose a near-linear time algorithm. In the distributed memory setting, our algorithm runs in linear time when minimizing the maximum part cost and in near-linear time when minimizing the total part cost.

Abstract Author(s): Peter Ahrens