Discovering Knowledge from Massive Networks and Science Data – Next Frontier for HPC

Alok Choudhary, Northwestern University

Photo of Alok Choudhary

Knowledge discovery in science and engineering has been driven by theory, experiments and more recently by large-scale simulations suing high-performance computers. Modern experiments and simulations involving satellites, telescopes, high-throughput instruments, imaging devices, sensor networks, accelerators, and supercomputers yield massive amounts of data. At the same time, the world, including social communities is creating massive amounts of data at an astonishing pace. Just consider Facebook, Google, Articles, Papers, Images, Videos and others. But, even more complex is the network that connects the creators of data. There is knowledge to be discovered in both. This represents a significant and interesting challenge for HPC and opens opportunities for accelerating knowledge discovery. In this talk, followed by an introduction to high-end data mining and the basic knowledge discovery paradigm, we present the process, challenges and potential for this approach. We will present many case examples, results and future directions including (1) Discovering knowledge from massive datasets from science applications including climate and medicine; (2) Real-time stream mining of text from millions of and tweets to identify influencers and sentiments of people; (3) Discovering knowledge from massive social networks containing millions of nodes and hundreds of billions of edges from real world Facebook, twitter and other social network data and (4) predicting structures from Simulation data. The talk will be illustrative and example driven and may include 1-2 live demonstrations.

Abstract Author(s): Alok Choudhary