Adam Oliner

School: Stanford University

Year in Fellowship: 3

Practicum: Sandia National Laboratories (2006)

Degree(s):   B.S. Computer Science and Engineering, MIT, 2004
M.Eng. Electrical Engineering and Computer Science, MIT, 2005

Field of Study: Computer Science and Engineering

Contact: oliner at gmail dot com

Personal web page: http://adam.oliner.net/

 

Summary of Research

In order to understand complex systems, we must discern dependencies among components. My research takes steps toward accomplishing this by applying two important insights: (1) anomalies that are correlated in time across components are almost certainly indicative of a shared influence, and (2) the timing of events in a system can reveal the semantics of their behavior.

My previous work has addressed challenges in high performance computing by making systems reliability and robustness a first-class research focus. I have designed more robust algorithms for job scheduling, techniques for identifying and predicting faults, methods for leveraging those predictions to significantly improve checkpointing and Quality of Service, and the most extensive study of system logs, ever.

Publications

RA: ResearchAssistant for the Computational Sciences. D. Ramage and A. J. Oliner. In Workshop on Experimental Computer Science (ExpCS), San Diego, CA, 2007.

What Supercomputers Say: A Study of Five System Logs. A. J. Oliner and J. Stearley. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), Edinburgh, UK, 2007.

Cooperative Checkpointing: A Robust Approach to Large-scale Systems Reliability. A. J. Oliner, L. Rudolph, R. K. Sahoo. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS), Cairns, Australia, June 2006.

Evaluating Cooperative Checkpointing for Supercomputing Systems. A. J. Oliner, R. K. Sahoo. In Proceedings of IPDPS, Workshop on System Management Tools for Large-Scale Parallel Systems, Rhodes Island, Greece, April 2006.

Cooperative Checkpointing Theory. A. J. Oliner, L. Rudolph, R. K. Sahoo. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), Rhodes Island, Greece, April 2006.

Cooperative Checkpointing for Supercomputing Systems. A. J. Oliner. Master of Engineering thesis at MIT, 2005. Advised by L. Rudolph.

Probabilistic QoS Guarantees for Supercomputing Systems. A. J. Oliner, L. Rudolph, R. K. Sahoo, J. E. Moreira, M. Gupta. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), Yokohama, Japan, 2005.

Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems. A. J. Oliner, R. K. Sahoo, J. E. Moreira, M. Gupta. In Proceedings of the First Workshop on System Management Tools for Large-Scale Parallel Systems at the International Parallel and Distributed Processing Symposium (IPDPS), Denver, CO, 2005.

Fault-aware Job Scheduling for BlueGene/L Systems. A. J. Oliner, R. K. Sahoo, J. E. Moreira, M. Gupta, A. Sivasubramaniam. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), Santa Fe, NM, April 2004.

Critical Event Prediction for Proactive Management in Large-scale Computer Clusters. R. Sahoo, A. Oliner, I. Rish, M. Gupta, J. Moreira, S. Ma, R. Vilalta, A. Sivasubramaniam. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, August 2003.

Autonomic Computing Features for Large-scale Server Management and Control. R. K. Sahoo, I. Rish, A. J. Oliner, M. Gupta, J. E. Moreira, S. Ma, R. Vilalta and A. Sivasubramaniam. In the IJCAI-03 Workshop on AI and Autonomic Computing, Acapulco, Mexico, August 2003.

An Overview of The BlueGene/L Supercomputer. The BlueGene/L Team. In Proceedings of Supercomputing and IBM Research Report, 2002.



Close