Growing up, Kelly Moran spent so much time at hospitals and clinics with her radiologist father that a career as a doctor seemed natural.
But she changed her mind after visiting a medical school as a Clemson University undergraduate. “I enjoyed the disease and epidemiology and population side of medicine, but I didn’t enjoy the needles and labs,” Moran says. She was unsure what to do, but a paper from a Los Alamos National Laboratory (LANL) epidemiology research group got her attention. She emailed the authors and got a job.
The team included statistician David Osthus. “He had the most useful and intuitive and practical insights” into the data and the team’s project, Moran says. That piqued an interest in statistics and she headed to graduate school at Duke University, where she studies with advisor Amy Herring under a Department of Energy Computational Science Graduate Fellowship (DOE CSGF).
“There are all these questions you don’t really know the answer to intuitively,” Moran says, “like how do you take this data stream and turn it into useful information. That’s where statistics comes in.” Instead of becoming a niche expert, she prefers “learning a logic to how you can understand and process these masses of data that are being created today.”
The LANL project sought methods to use Internet data, such as searches and hits on Wikipedia articles, to forecast global disease or local medical trends. She helped assess whether such an approach also could be used to predict plant and animal disease – a more difficult task because “a cow can’t Google its symptoms.”
That led her into her current research, which also seeks useful insights in data troves. Moran applies methods for dimension reduction, which finds simpler underlying structures in complex data sets, making it easier to identify key elements.
For instance, Moran says, statistics on how often people have a cough or a fever or search the Internet for “flu” are underlying measurements of people suffering from a respiratory illness. “You don’t necessarily need to know all of those exact details if there’s this underlying concept of sickness you’re trying to get at.”
One project took Moran to the African country of Tanzania to study verbal autopsy data – structured interviews with next of kin about symptoms a person exhibited before dying. Dimension-reduction could help understand those data and predict new deaths from the same condition.
In another study, Moran applied joint dimension reduction to toxicology, analyzing data on a chemical’s molecular structure and harmfulness at varying doses. The method could produce an underlying data structure that can help predict the dose response for an untested chemical.
Moran returned to LANL for two DOE CSGF summer practicums, working each time with Earl Lawrence in the Statistical Sciences Group. In 2017, she focused on statistical emulators, programs that predict, given a set of input conditions, what a large, time-consuming computer simulation would produce, along with uncertainty in that outcome.
“The big idea is that if you have some computer simulation that takes a super-long time to run,” such as a day or more, researchers could use an emulator to quickly test input settings without running the full model.
The project focused on dynamic compression experiments at DOE’s SLAC National Accelerator Laboratory, in which lasers blasts squeeze materials to quantify their responses. Researchers must perform as many tests as possible in a limited time. A statistical emulator can help them quickly identify promising parameters for their next experiment. The LANL group’s code targeted velocity data.
During her practicum, Moran improved algorithms that automatically produce emulators and the techniques to calibrate them. She incorporated uncertainty into prediction outputs, directing experimental researchers toward promising parameters.
Moran’s 2018 practicum aimed to accelerate the emulators, making them more practical for experimentalists. “We were trying to get this system to be as automated as possible,” with emulators already trained on data from the target model. Unfortunately, the software Moran chose for the task wasn’t particularly fast in handling data. “It ended up being a dead end, but the goal remains the same.”
For summer 2019, Moran is back at Los Alamos for a third practicum, working again with Osthus. They’re studying influenza forecasts from Internet-based disease surveillance models to better understand when and why they do well or fall short.
With all the time she’s spent at Los Alamos, it seems logical that Moran would end up there after graduation, sometime in 2020, but she also plans to apply elsewhere.
Image caption: A summary of select associations between signs or symptoms in a verbal autopsy data set. For example, across all decedents (bottom row), the association between these particular symptoms weakens with age (lighter colors). This could be because older people tend to have more ailments in general while younger people likely experience clusters of symptoms relating to a specific illness. Generally, these associations differ across both age and cause of death (COD), highlighting the importance of modeling these data in a way that allows covariance between symptoms that diverges by covariate and cause. Each subplot corresponds to the association between select signs or symptoms in the data as calculated by Yule's Q (a measure of association with a possible range -1 to 1). The subplot rows correspond to CODs: from top to bottom AIDS/TB (757 deaths), cardiovascular (793 deaths), injuries (839 deaths), and all causes (7,365 deaths). The subplot columns correspond to age groups: from left to right 12-24, 25-44, 45-64, 65+ and all adults. Sample sizes for each combination of COD/age group are shown in the top left of the subplots. Credit: Kelly Moran.