Pacific Northwest National Laboratory
Harvesting the Fruits of the Genome Revolution
If someone were to describe a structure consisting of anti-parallel strands of 2-deoxyribose joined by phosphate bonds and annealed by hydrogen bonds between complementary nucleotide bases, you may not know what they’re talking about. Even a person untutored in biology, however, probably will recognize a picture of a DNA strand as the iconic symbol of life. Add color to highlight certain parts of the strand, and it becomes easy to see how DNA’s structure suggests its function.
That kind of biological visualization has accelerated advances that otherwise may not have been made at all in areas such as protein structure and function determination, drug design, and other fields.
Today, a similar revolution is underway in the growing field of “-omics” biology: genomics, proteomics,metabolomics, transcriptomics, and a host of subspecialties that deal with enormous amounts of data generated by high-throughput devices. At Pacific Northwest National Laboratory (PNNL) in Richland, Washington, large research teams are engaged in several proteomic projects ranging from spotting breast cancer biomarkers to identifying organisms that degrade hazardous waste or create biofuels. The sheer quantity of data generated by these new biology fields requires a computational solution. But even with computers helping to reduce data to manageable chunks, biologists must make judgments about the biological relevance of their results.
In proteomics, a field in which a single experiment can generate more than 100,000 data points on a spreadsheet, the burden of analyzing experimental results can quickly become overwhelming. At PNNL alone, the high-throughput proteomics facility produces hundreds of gigabytes of experimental data per day. Analytical techniques and data filters can help reduce that burden, but what has been lacking is a way to visually scan the results so meaningful patterns are easy to identify.
“I realized very early on that we have a data mining problem in biology,” says Josh Adkins, a PNNL biochemist who is working in proteomics. “That’s our biggest issue right now, even more so than experimental design. Unfortunately, there aren’t a lot of tools available to help solve the problem.”
In response, a teamof bio-mathematicians and computer scientists at PNNL set about creating a software program that would help them quickly scan their results and zero in on potentially meaningful ones.
A screenshot showing different views provided by PQuad.
Click on the image for a larger version and more information.
The result is a visualization tool called PQuad, for Peptide Permutation and Protein Prediction. It can take the raw data representing thousands of protein fragments, reassemble them, and create a visual overlay that shows where they are encoded on a chromosome. It also assigns each peptide a color that is linked to its presence or absence in the experimental sample. Finally, and perhaps most importantly for the biologists, PQuad also can compare the results of experiments run under two different conditions, such as with and without a specific nutrient, highlighting the proteins that change abundance under the two conditions. In addition, the software allows the user to “zoom in” to three levels of detail: global (whole chromosome), intermediate (partial chromosome), and detailed (short stretches of DNA sequence).
“We understood that being able to see the peptides in their genomic framework is really the only way that you can get to the information about how a protein is being expressed in the context of its neighbors,” says Bobbie-Jo Webb-Robertson, a senior research scientist specializing in statistical inference models for bioinformatics, and leader of the team that developed PQuad.