Algorithmic modifications of locally preconditioned parallel iterative solvers to balance CPU and memory load

Richard Mills, College of William & Mary

Photo of Richard Mills

Clusters of workstations (COWs) are an increasingly popular and cost-effective means of solving scientific problems. For myriad reasons, these systems are often heterogeneous in terms of processor and memory resources. For example, upgrades to a cluster may occur in stages. Heterogeneity may also arise dynamically: if an cluster is time-shared, some nodes that are more heavily loaded than others will have less CPU time and memory available for an application. To effectively utilize heterogeneous COWs, dynamic load balancing is crucial.

We investigate some novel approaches for dynamically balancing the resource utilization of parallel implementations of iterative methods for solving eigenproblems and systems of equations. The methods employ some local preconditioning operation on each compute node. Such methods are crucial to many scientific and engineering applications. Because all nodes need not perform their preconditioning work to the same accuracy, the preconditioning phase can be perfectly load balanced by having all nodes exit the phase after a fixed amount of time. If care is taken, this can be done without compromising the overall convergence behavior. Because in many cases the majority of time is spent in the preconditioning phase, this approach can yield excellent overall CPU load balance. Additionally, we have balanced memory load by receding the preconditioning work on a node that is thrashing due to contention for memory with other processes: the idea is to hopefully speed the completion of the competing jobs and thus their relinquishment of resources. We present experimental results from some different application codes that confirm the effectiveness of our CPU and memory balancing approaches.

Abstract Author(s): Richard Tran Mills