Carnegie Mellon University
By the time Alnur Ali began his Department of Energy Computational Science Graduate Fellowship (DOE CSGF) in 2014, he’d already charted an odd path to Ph.D. studies at Carnegie Mellon University.
Inspired early on by a book from world-famous physicist Richard Feynman, Ali focused on computer science and mathematics at the University of Southern California and studied physics at the University of Cambridge in England.
But he graduated with “no real way to combine those three areas,” Ali now recalls. Feeling divided, he spent 2004 to 2013 as a Microsoft software engineer. The experience sowed seeds for a future in machine learning, a growing field in which special algorithms guide computers to teach themselves to search widely for hidden but related additional information.
Microsoft encouraged Ali to work with the Seattle research community, including the statistics faculty at the University of Washington. Though he calls those interactions “a very interesting, cool challenge,” he quickly notes that “I always knew I wanted to go back to graduate school.”
Ali enrolled at Carnegie Mellon as a doctoral candidate in machine learning in 2013, and his research took him to Stanford University during 2014.
“Machine learning is everywhere these days,” he says. “So it’s important to theoretically understand the pros and cons of different methods.” Those include “their predictive accuracy, how long they take to run and how easy they are to use.”
“In part, that boils down to working a bunch of math to explain these tradeoffs. The other part is applying machine learning in new ways that can help people, which sometimes requires developing faster algorithms. As I’ve gone through my Ph.D., I’ve wanted to do more good for society at large.”
Two machine-learning methods Ali studied for his 2015 DOE CSGF practicum at DOE’s Lawrence Berkeley National Laboratory have influenced his approach.
The first, the “inverse covariance matrix,” is “basically related to how correlated two objects are,” he says. The second, “pseudo-likelihood,” is “better at estimating correlations when you have a lot of measurements, like in big data.”
He employed both methods for a 2017 paper he co-wrote – with practicum advisor Sang-Yun Oh and two other collaborators – for the 20th International Conference on Artificial Intelligence and Statistics in Fort Lauderdale, Florida. That study focused on so-called high-dimensional, or big data, situations “where the number of variables is possibly much larger than the number of data samples,” they wrote. They demonstrated how those methods apply to real-world finance and wind-power data.
With doctoral advisors Zico Kolter and Ryan Tibshirani, Ali also tested those techniques in a 2016 paper written for the 30th Conference on Neural Information Processing Systems in Barcelona. The authors used them to study multiyear spreads of influenza around the United States.
Ali is now working with Oh on a third paper that uses those methods to characterize functional relationships between brain regions where a single pixel location within a functional magnetic resonance imaging brain scan is the smallest unit, says Oh, now an assistant professor of statistics and applied probability at the University of California, Santa Barbara.
For the ongoing brain study and the 2017 paper, Oh and Ali have relied on Edison, Berkeley Lab’s 2.57-petaflops supercomputer.
“Data-driven scientific discovery is an important trend in many research fields,” Oh says. “Alnur’s research interest and skill set is in the sweet spot of this trend.”
Ali says he’s grateful for the fellowship’s financial support and for the knowledge and experiences it has provided. It has made him “more interested in doing work on the theoretical side (without) losing sight of the real-world applications. Also, I am more willing to take risks with my research.”
Ali, a Toronto native whose parents are from India, went to high school in Singapore when his father was assigned there for business. Besides taking inspiration from Feynman’s semi-autobiography, Surely You’re Joking, Mr. Feynman, Ali also took interest in the computer his father brought home. “At first I just played games on it. Then I started wondering how I could make my own game. That led to trying to figure out how to program code.”
So, many years later, what will he do when he gets his Ph.D.?
“I’m pretty much open to any place where I can work on interesting problems with interesting people and make a difference in the world.”
Read the entire article in DEIXIS, the DOE CSGF annual. [PDF, pages 16-18]
Image caption: Each row presents a segmentation of left and right hemispheres of the human cerebral cortex computed by different methods. Row one: the results of HP-Concord (with persistent homology fine-tuning), a technique Alnur Ali and colleagues have developed. The data-driven method’s segmentations resemble those derived from neuroscience methods. In contrast, segmentations in the second and third rows, from other data-driven methods (middle, HP-Concord plus Louvain; bottom, thresholded sample covariance matrix plus Louvain), appear to lack fine detail. Credit: After a figure in “Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation.” Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydın Buluç, Dmitriy Morozov, Sang-Yun Oh, Leonid Oliker, and Katherine Yelick. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), 2018.