Kayla McCue

Massachusetts Institute of Technology

Kayla McCue got her first taste of computational biology research at the California Institute of Technology after her sophomore year of high school. She’s still studying the same subject.

Meanwhile, the Department of Energy Computational Science Graduate Fellowship (DOE CSGF) recipient has also played softball since she was 7 years old. “I’m apparently a person who sticks with things,” she says.

McCue grew up with scientist parents. Her mother has a biology Ph.D. and taught science at her daughter’s Southern California elementary school. McCue remembers testing her mother’s lesson plans but also would justify making a mess by saying, “I was doing an experiment, Mom. It was for science.”

McCue learned to code during her first Caltech research experience. She explored statistical differences between two high-throughput RNA sequence data sets to better understand the function of a transcription factor, a molecule that regulates how genetic material is processed and used within a living cell. That experience led her to major in applied mathematics at Caltech, with an informal focus on biology.

McCue continued to pursue biology research there and later at Stanford University, gradually moving from experimental wet work toward computational projects. “What I liked a lot about molecular biology is that there are so many things that are going on in your body all the time – that are necessary to keep you functioning – and you have no idea what they are.”

For her doctoral research with Chris Burge at the Massachusetts Institute of Technology, McCue has focused on one way that cells process genetic information: pre-mRNA splicing. Cells transcribe instructions from DNA to messenger RNA (mRNA). But only exons, portions of these initial mRNA sequences, encode for proteins, molecules that perform vital cellular tasks. Cells splice, or edit out, the other parts, known as introns.

The cell’s splicing machinery can first identify the sequence to be removed, a process known as intron definition. Or it can recognize the portions that will remain in the mRNA – exon definition – before rearranging to snip out the intron. In a study of fruit flies, a model organism, McCue and her MIT colleagues discovered that both exon-defined and intron-defined splicing can occur quickly and that exon-defined splicing is more accurate. Now she’s developing computational models to understand the differences between these two recognition strategies and the biological reasons for why they occur.

One issue seems to be the relative lengths of the intron and exon sequences. Genes with long introns flanking short exons are expected to use exon definition, but short introns and longer exons suggest intron definition, McCue says.

By this metric, many organisms use one predominant splicing strategy. Simple, single-celled organisms typically use intron definition, whereas mammals and other complex organisms generally use exon definition. Fruit flies are in the middle: Most of their splicing is intron-defined, but a sub-population of genes uses exon definition.

McCue wants to explore the role evolution may have played in this process. Exon definition could help to determine the different physical traits observed in higher organisms, she adds.

During her 2018 practicum at Lawrence Berkeley National Laboratory’s Joint Genome Institute, McCue worked with Zhong Wang on the Genome Constellation project, a tool that creates a fingerprint of a genome and compares it against others to determine their similarities. Understanding these relationships requires comparing many large data sets, a computationally intense challenge that can become unwieldy when considering millions of genomes. McCue worked on a way to use the similarity scores to classify unknown genomes. The team has posted a manuscript describing its work on the preprint server bioRxiv.

The DOE CSGF’s broad coursework and the opportunity to learn from alumni and other fellows at the annual program review have proven invaluable, McCue says. She aims to finish her Ph.D. by the end of 2021. She expects to pursue postdoctoral research in computational biology but is exploring a range of career options.