A Novel Motif Discovery Tool for RNA Secondary Structures

Sarah Middleton, University of Pennsylvania

RNA sequences self-basepair to form diverse secondary structures. An important subset of these structures, called "motifs," appear again and again across different RNAs and perform specialized functions, e.g. regulating protein binding, splicing, translation or sub-cellular localization. Identifying motifs shared between co-regulated transcripts may yield significant insight into their binding partners and mechanism of regulation. However, most methods for structural motif analysis rely on low-accuracy in silico folding of individual sequences or computationally expensive pairwise alignments, resulting in a trade-off between speed and accuracy that can be problematic for genome-scale data sets.

In this talk I will describe an alternative approach to RNA structure motif analysis that does not require individual folding or all-vs-all pairwise alignment of the input sequences. Our method is inspired by the idea of an "empirical kernel," where the distance between any two objects is computed within an observation-spanned subspace by comparing each object to a set of empirical examples or models. We validate our approach on known structure families and apply it to identify structure motifs enriched in neuronal RNAs that may play a role in sub-cellular localization.

Abstract Author(s): S.A. Middleton, J. Kim