The Big Data Revolution in Astrophysics – A Case Study of the Palomar Transient Factory
University of California, Berkeley; Lawrence Berkeley National Laboratory
Astrophysics is transforming from a data-starved to a data-swamped discipline, fundamentally changing the nature of scientific inquiry and discovery. New technologies are enabling the detection, transmission, and storage of data of hitherto unimaginable quantity and quality across the electromagnetic, gravity and particle spectra. The observational data obtained in the next decade alone will supersede everything accumulated over the preceding four thousand years of astronomy. Within the next year there will be no fewer than four large-scale photometric and spectroscopic surveys underway, each generating and/or utilizing tens of terabytes of data per year. Some will focus on the static universe while others will greatly expand our knowledge of transient phenomena. Maximizing the science from these programs requires integrating the processing pipeline with high-performance computing resources coupled to large astrophysics databases with near real-time turnaround. Here I will present an overview of the first of these programs, DeepSky and the Palomar Transient Factory (PTF), the processing and discovery pipeline we have developed at LBNL and NERSC for them and several of the great discoveries made during the first three years of observations with PTF.