Power Prediction in Supercomputing

Hilary Egan, University of Colorado

Photo of Hilary Egan

As supercomputers reach the exascale, power consumption is quickly becoming a limiting factor; requiring leadership-class HPC systems to have dedicated power plants is clearly not a sustainable path. Smaller supercomputing facilities and data centers also are affected by power constraints, namely through surcharges due to exceedingly high peak-power draws. To better understand power use in HPC workloads, the National Renewable Energy Laboratory (NREL) supercomputing facility has tracked the power each computer node used at 10-second intervals over the course of the past year. The facility also made a record of all jobs submitted, with their metadata, over the same period. Using these data, we have performed cluster analysis and dimension reduction with the goal of understanding typical power characteristics and patterns across a wide variety of applications. This analysis informs our efforts to predict, a priori, the power a given job will use. These predictions are integral for designing new power-aware schedulers and workload managers. Both a random forest and robust regression method were evaluated and were shown to have average RMSE values of roughly 40 watts (about 10-15 percent of typical job power). To evaluate the efficacy of these predictions in practice, we utilize simulations of the supercomputing job schedule to determine if peak power use could be reduced. We found that with a minimal delay in mean job wait times the overall variability in total system power use can be mitigated.

Abstract Author(s): H. Egan, C. Phillips