Now on ASCR Discovery – CSI: Brookhaven

Wednesday, April 27, 2016

ASCR Discovery: CSI

Investigators at Brookhaven National Laboratory’s Computational Science Initiative (CSI) haven’t yet been called on to solve crimes, like their CSI television counterparts, but they’re cracking even more substantive data-driven puzzles.

Already an established group, CSI recently announced a significant expansion that will greatly increase its research and development capabilities. Its ambitious vision is to shift data collected on large scientific instruments from labor-intensive retrospective analysis to real-time, on-the-fly interpretation. The idea is to allow nimble fine-tuning of the information gathered as experiments are still running. But there are several key problems to solve first, and Brookhaven is amassing key recruits and partners to take on the challenges.

CSI Director Kerstin Kleese van Dam, a computing industry leader in data infrastructure and management, is assembling teams to tackle three key areas: novel data structures and scalable algorithms designed for computer architectures such as those based on graphics processing units (GPUs); seamless data movement between instruments and computers; and computing models that bring scientists and engineers into the design and analysis process.

The CSI is focusing first on large multi-user science facilities, such as Brookhaven’s Relativistic Heavy Ion Collider, National Synchrotron Light Source II (NSLS-II) and Center for Functional Nanomaterials (CFN); DOE’s Atmospheric Radiation Measurement Program; and the ATLAS experiment at Europe’s Large Hadron Collider. CSI aims to help scientists identify critical information in a data stream and enable them to steer their experiments to new discoveries.

Although coordinated through Brookhaven, the CSI group also seeks input from user facilities in DOE and beyond. To that end, Brookhaven began a hackathon series in late 2015. The first weeklong event gathered data scientists from the five DOE X-ray light and neutron-scattering sources. Such events can help foster collaborations that will be critical to creating open data structures that work across DOE’s approximately 240 shared scientific instruments, Kleese van Dam says.

Over the course of a week, scientists addressed crosscutting data challenges at their respective facilities and worked toward real-time streaming data analysis.

One project used machine-learning methods to cluster and categorize data generated at NSLS-II and combined them with a streaming visualization tool that highlighted decision-critical insights for the scientists, Kleese van Dam says. “That’s important because the volume of information from large scientific instruments can quickly become unwieldy.” For instance, she notes, there’s an instrument at CFN that produces 400 images per second. At that rate, data analysis and extraction are critical, as is working with users to identify the most scientifically interesting information.

“It’s easy to find features in the data, but what’s of interest to the scientist may be a completely different matter,” Kleese van Dam says. Advanced manufacturing and materials-by-design, for instance, requires engineers “to influence the process to control the outcome.”

Such time-sensitive process control will require devising experiment-steering algorithms for vast amounts of data. “Most of the algorithms that we have today just don’t scale to large enough data volumes,” Kleese van Dam says. “I get terabytes of data per minute. There is nothing out there at this time that can deal with this data flow.”

Read more at ASCR Discovery, a website highlighting research supported by the Department of Energy’s Advanced Scientific Computing Research program.

Image caption: Target interface with pressure data (bar) during compression by a plasma liner formed when argon plasma jets merge, a simulation that addresses a fusion problem. This work was performed with the FronTier code using interface tracking and is an example of a Big Data challenge like those Brookhaven National Laboratory’s Computational Science Initiative is tackling. Visualization courtesy of Brookhaven National Laboratory.