This type of comparison is possible because genome projects have already determined the composition of DNA. Because DNA is a simple molecule — four types of paired nucleotides (called bases) forming a well-defined double helix — it is relatively easy to line up and compare two strands.
Unfortunately, this approach does not define which protiens dock at the binding sites, though it can help to guide experiments. Mintseris thought that if he applied the protein-protein docking algorithms developed in Weng’s lab to the problem, he could match specific proteins with specific binding sites.
The sheer magnitude of the problem is daunting. A 10-base-long strand of DNA carries about 1 million possible binding sites. A protein docks only at a handful of these. Past attempts to use protein-DNA docking algorithms have winnowed this down to about 1000 possibilities, but it would take years of testing to identify the actual docking sites.
Working with Eisen’s team and its dedicated 40-cluster supercomputer,
Mintseris began applying protein docking algorithms to DNA. He showed
some success working with known proteins, but it takes months or even years
to isolate a protein structure, and few are known.
Instead, Mintseris developed a way to guess the structure of unknown proteins. He does this by using an algorithm to compare the composition of a target protein with that of proteins of known composition and structure. The algorithm then makes educated guesses about the target protein’s structure. How well does this work? Several years ago, a research group developed a powerful algorithm that looked at experimental data and predicted the likelihood of a well-known protein family binding with hundreds of DNA sequences.
“We tested the same proteins using our algorithm and our predictions were almost as good. But we didn’t use any experimental data, which took who knows how many hours to collect. We were able to predict four of the top 10 sequences known to bind that protein in real life.”
The ability to automatically deduce binding affinities could help guide research for years to come. It may help scientists learn to produce proteins to control debilitating genetic diseases or turn off proteins that mediate unregulated growth such as cancer cell growth.
Mintseris is cautious. He notes that he has worked with only one protein whose behavior is relatively easy to predict. Eisen agrees, noting that it has not yet been verified experimentally, but adds that the work was “surprisingly effective” for such short development time.
Back at Boston University, Mintseris continues to sort through pieces of nature’s most complex puzzle.
