As an undergraduate student, Adam Riesselman worked in wet labs at Drake University and at seed companies in his native Iowa. But he doesn't miss it now that he uses computers to pursue similar genetics research at Harvard University.
Riesselman, a Department of Energy Computational Science Graduate Fellowship recipient, says biological experiments are necessary for discovery in the life sciences but "many of these advances can be really slow." Using computers to decipher genetic relationships is faster and lets Riesselman target possible solutions so experimentalists can focus on the most promising possibilities, saving time, money and frustration.
Riesselman grew up on a farm, where he learned how enhanced plant genetics can improve lives and raise incomes. Now he develops computational tools to understand how chunks of DNA translate into physical characteristics in organisms.
Today, biology and genetics researchers can read and write DNA but struggle to connect an organism's genotype, or its unique genetic makeup, with the resulting phenotype, or its physical appearance and function. They attempt to connect the two by creating and analyzing every possible variant in the lab, a process called deep mutational scanning, then sifting the mutations for ones that produce relevant phenotypes.
"This can take a lot of money and time," Riesselman says. To accelerate the process, "we use evolution as our experiment. We can boil down millions to billions of years of evolution and do a pretty good job of explaining variation of a phenotype."
In one paper, Riesselman, his advisor, Debora Marks, and Harvard graduate student John Ingraham describe a generative model of genetic variation. They fitted the model, which they call DeepSequence, to gene sequences that evolved naturally, then generated sequence examples similar to those lab researchers would develop. Through a learning process similar to evolution, the model identified relations between sequences. DeepSequence can predict the effects of mutations across a variety of deep mutational scanning experiments better than other techniques, the researchers write.
"We're simply asking the question, does a mutated sequence look like something that has existed in nature?" Riesselman says, because existing sequences are likely to be useful. Such models are "never going to replace what we see in the lab, but they can help augment" experiments by targeting the best candidates.
Riesselman sees at least a couple of ways his research could help people. First, it could help doctors determine whether unusual mutations found in patients' DNA are damaging. "We can use this model to say, have we ever seen this mutation before in nature? By doing that we can help understand if this mutation is going to be good or bad."
Second, it could accelerate the engineering of microorganisms to produce desirable materials, such as drugs or vitamins. Researchers could "use our technique to gather as much information as possible about a protein they desire and then design their experiments with our predictions in mind."
Riesselman focused on the latter idea during his 2016 practicum with Sam Deutsch at the DOE Joint Genome Institute, based at Lawrence Berkeley National Laboratory.
The project focused on microbes that produce two substances: the vitamin thiamine and the antibiotic violacein.
Riesselman analyzed data and modeled chemical pathways the bacteria use to produce the vitamin. Using machine learning algorithms, he focused on narrowing the number of possible experiments needed to find gene sequence changes that boost vitamin synthesis. The results could help cut the cost of developing microbes that maximize production of a desired substance.
The violacein project, meanwhile, tapped more directly into Riesselman's Harvard project. Researchers found a number of unexpected mutations in their engineered microbes. "It was confusing at first," he says. "They just happened to show up in these pathways."
It turned out that as the researchers performed tests, they were selecting from microbes that did or didn't have functional pathways to produce violacein, in essence forcing evolution in the organism. With so many changes, it's difficult to know which are beneficial. The mathematical methods Riesselman develops for his thesis can marry what evolution has done with what scientists have tried in the lab to explain the variation seen in these experiments.
Riesselman is continuing his collaboration with Deutsch's group. The experience contrasted with his largely theoretical thesis research. The practicum "gave me great experience analyzing real data and integrating what I'm going to be doing" after graduation in 2019, when Riesselman expects to get a job in industry.
Image caption: A latent variables mathematical approach can capture the organization of evolutionarily related protein sequences. In this two-dimensional latent space, proximity reflects groupings of genetically related organisms in a phylum of bacteria (each designated by color) that produces beta-lactamase, an enzyme that affords resistance to some antibiotics. The variation within a single deep mutational scanning experiment occupies only a tiny portion of the sequence space of the entire evolutionary family. Image courtesy of Adam Riesselman and colleagues.