David Ozog

University of Oregon

As a chemistry graduate student, David Ozog grasped what he didn’t know. Then he took steps to learn.

Ozog, double-majoring in applied math and physics at Whitman College in southeastern Washington, first dipped into chemistry during an undergraduate research program at Marina Guenza’s University of Oregon laboratory. She introduced Ozog to computer simulations of proteins. It led Ozog, now a Department of Energy Computational Science Graduate Fellowship (DOE CSGF) recipient, to pursue a chemistry doctorate at Oregon.

“There’s great power and opportunity in being able to use computer models to simulate chemical systems,” he says. But there was a problem: “These simulations were incredibly computationally expensive,” sometimes taking months.

He knew of ways to accelerate such computations, but “I decided there was too much computer science I didn’t know” to address the issue, Ozog says. After earning a chemistry master’s degree, he worked in programming, mastering the Python language. A year later, Ozog returned to Oregon’s graduate school, this time in computer science. His goal: make computational chemistry codes run more efficiently, enabling bigger and more precise simulations.

One of his Oregon mentors, Sameer Shende, recognized Ozog’s unusual skill combination and suggested an internship at Argonne National Laboratory, where former DOE CSGF recipient Jeff Hammond was working with NWChem, DOE’s leading computational chemistry code.

NWChem runs on massively parallel computers and calculates the effects of quantum physics, which governs particles at the tiniest scales, when simulating atomic and molecular interactions. But parts of the code can overwork some processors while leaving others idle.

In his 2012 Argonne internship, Ozog focused on NWChem’s tensor contraction engine (TCE). Tensors characterize relationships between data in multiple dimensions. (A matrix is a two-dimensional tensor.) Tensor contraction sums the products of tensor components over one or more indices to reduce the answer’s dimensions.

The TCE distributes calculations across thousands or even millions of processors, Ozog says. But “if one processor wants to do a particular task, it might have to get data from two different processors. It might even have to get data from four, eight or more,” slowing calculations.

NWChem also used a centralized approach: Every processor ready to do work interacted with a single core in memory to determine which tasks needed completion.

Ozog/DEIXIS

After studying how data moved and processors contended for access to this processor, Ozog and Hammond developed an “inspector and executor” approach. The inspector algorithm estimates how long tensor contractions take for particular data. The executor assigns tasks to processors accordingly. “It improves the situation of just relying on one processor to assign tasks and to keep (processors) busy,” Ozog adds. The new approach often made the code run significantly faster.

In more recent research with his doctoral advisor, Computer and Information Science Professor Allen Malony, Ozog automated operations in a coarse-graining code, which makes atomic-detail simulations easier to solve by treating several atoms as a unit rather than individually. The model omits some details but is valid for calculating things like a material’s thermodynamic properties. A coarse-grained representation could capture those exactly with greater computational efficiency than a detailed model, Ozog says.

Suppose, however, that someone wanted to return to a higher-resolution model. “Can you reconstruct that original representation without messing things up?” Ozog asks. In an April 2015 paper published in the Journal of Computational Science, he and his collaborators presented a technique to manage this arduous transition.

Say a researcher is modeling a polymer using a standard molecular dynamics code like LAMMPS from Sandia National Laboratories. Such models omit quantum mechanical properties yet demand computational power. “We have codes that efficiently coarse-grain” the system being modeled, but “these codes don’t exist in LAMMPS,” Ozog says. A researcher must manually implement the transitions from atomistic to coarse-grained and back again.

With Ozog’s workflow system, programmers can specify how they want the transition to happen. “Instead of waiting for the LAMMPS simulation to finish and then running a separate code, it’ll just happen automatically,” Ozog says.

Ozog’s dissertation focuses on these scientific workflow problems, such as coupling quantum codes like NWChem with molecular dynamics codes so they can share data, or managing interactions between codes written in different languages.

After finishing his research in December 2016, Ozog moved to Massachusetts to start work with Intel Federal LLC, an Intel Corp. subsidiary that supports high-performance computing at DOE and other government agencies.

Read the entire article in DEIXIS, the DOE CSGF annual. [PDF, pages 19-21]

Image caption: A 1,000-molecule water system simulation with a quantum mechanics/molecular mechanics (QM/MM) approach. The enlarged water molecule in the middle is modeled with QM; all other molecules are modeled with MM. Credit: David Ozog.