Jordan Hoffmann was studying protein folding when it came time to declare his major at Johns Hopkins University. Instead of biophysics, he chose physics and mathematics and switched to studying planet formation. "In some parts, I found it surprising how much the skills transferred," says Hoffmann, a Department of Energy Computational Science Graduate Fellowship (DOE CSGF) recipient. "In some sense, it's what questions do you want to ask, because the tools are very similar."
Hoffmann had planned to continue focusing on astrophysics after earning his bachelor's degree and moved to Harvard University for doctoral studies with applied mathematician Chris Rycroft. Instead, he found himself pulled into another project that harkened back to his biophysics experience. Now the research has become Hoffmann's thesis topic, engaging both his fascination with interesting questions and his enthusiasm for computation.
It started when Seth Donoughe, a Harvard graduate student working with Cassandra Extavour in the organismic and evolutionary biology department, sought a math and physics collaborator. "He came armed with a laptop and a few videos and asked 'Can anyone help me?'" Hoffmann says.
Donoughe's videos showed microscopic images of developing crickets. He and Extavour are exploring an unusual characteristic of insect reproduction: In nascent embryos, nuclei divide but don't become individual cells. Instead, they spread through the cytoplasm, the cellular material outside the nucleus. Eventually some nuclei collect in one location and develop into an embryo; others make support tissues analogous to a placenta in humans. The researchers want to understand what influences the nuclei to move and what leads some to become part of the embryo, while others don't.
Donoughe and Extavour experiment with a line of Gryllus bimaculatus that's been genetically modified to produce a fluorescent protein, making nuclei easier to see with specialized microscopes. Each embryo is imaged from four different angles every 90 seconds for up to 12 hours, tracking nuclei as they divide from a handful to thousands. Using Harvard's Odyssey computing cluster, the researchers turn the images into a three-dimensional representation at each time point, creating movies of glowing dots on a dark background - nuclei dividing and moving. "You're watching the assembly of life," Hoffmann says.
Each 3-D reconstruction, however, comprises as much as 8 terabytes of data - enough to fill dozens of standard home computer hard drives. To analyze the results, Hoffmann has developed a computational pipeline using machine learning algorithms to identify and track the nuclei as they divide and move. "A lot of the data we have are really the first of their kind," Hoffmann says.
Now the researchers are developing mechanical models to make predictions scientists can test in experiments. The team is preparing a paper on its results.
The project intrigues Hoffmann. "There's something tantalizing about being able to watch something" with the hope of defining fundamental biological principles. "It's also a very fun project because it combines so many different fields."
For his 2016 Lawrence Berkeley National Laboratory practicum, Hoffmann went from studying a single organism to understanding myriad microscopic creatures. His mentor, Zhong Wang at the lab's Joint Genome Institute, studies metagenomes - genetic information for thousands of organisms in a single environment, such as a drop of ocean water.
To understand how microorganisms in these communities are interrelated, Wang and other researchers simultaneously extract and decode all of their DNA. To accelerate the process, the DNA strands are broken into sections and parceled out to sequencing machines, then reassembled. It's like cutting a library books into their component sentences, mixing the pieces and then reassembling each text, Hoffmann says.
Luckily, there are multiple copies of each genome, Hoffmann says. Assembly programs search for segments that overlap or appear with similar frequencies. "You can start to build a big graph and use tools from graph clustering to try to piece together these metagenomes."
On his practicum, Hoffmann helped evaluate a metagenomics analysis program based on the Apache Spark open-source cluster computing framework. It relies on about 20 parameters to tune its operations, making it difficult to find the right combination for the best and fastest assembly. Using Genepool, a computing cluster at Berkeley Lab's National Energy Research Scientific Computing Center, Hoffmann predicted what parameters were the most useful and what performance researchers could expect from the code.
Hoffmann aims to graduate sometime in the next two years. Beyond that, only one thing is sure about his future: "Figuring out what questions I want to ask is something I don't think I'm done with."
Image caption: Using experimental data from the cricket Gryllus bimaculatus, Jordan Hoffmann and colleagues developed a computational model describing nuclei moving inside an egg. The model lets them research questions related to the fate of nuclei, such as what percentage settle in the embryonic region at coalescence. In the image, a region around each nucleus is colored by the fraction of its descendants that end up in a particular region where nuclei will coalesce. Nuclei outside of this region play important roles in making supporting tissue. Credit: Rendering by Jordan Hoffmann and Seth Donoughe with input from Cassandra Extavour and Chris Rycroft.