Dynamic Resource Scheduling of Jupyter Notebooks at Cell-Granularity

Louis Jenkins, University of Rochester

Photo of Louis Jenkins

Jupyter Notebooks are widely used in data and computational science. They have varying resource requirements, from needing multiple GPUs and CPU cores to nothing when idle. Unlike typical HPC jobs, the runtime of a notebook depends on the cells' resource requirements. Jupyter Notebooks consist of code blocks called cells that drive computation, and the number of resources needed is determined by these cells. There can be arbitrary periods of time between cells where no computation is required, known as "think time." This time is spent on tasks such as writing new code or analyzing prior results. Since certain cells require fewer resources than others, coarse-grained allocation, such as batch processing, is not feasible. Therefore, we propose a scheduler that operates at the cell level rather than the notebook level. The scheduler can identify periods of think time and redistribute resources to other waiting notebooks. This allows multiple users to operate on an overlapping set of resources, while maintaining both interactivity and increasing system throughput. An early prototype of this cell-level scheduler, dubbed the 'dynamic' scheduler, anticipated resource needs and managed resource allocation using offline traces. The performance of the dynamic scheduler was compared to that of a scheme relying on the OS scheduler and another scheme involving resource partitioning across notebooks. The evaluation metrics used were Averaged Normalized Turnaround Time (ANTT) and System Throughput (STP). When tested on various Machine Learning notebooks executed without any think time between cells, the dynamic scheduler demonstrated notable benefits. To further evaluate the dynamic scheduler's performance with think times, over a year's worth of logs from an interactive application called 'Arkouda' were acquired for analysis and simulation of think times in applications, paving the way for more comprehensive testing.

Abstract Author(s): Louis Jenkins