Integrative image analysis of Drosophila in situ hybridization data

Charles Frogner, Massachusetts Institute of Technology

Understanding the spatio-temporal control of gene expression during development is one of the major challenges in developmental biology. Large-scale in situ hybridization screens in model organisms permit the mapping of spatial patterns of expression for thousands of genes at many developmental stages, potentially enabling genome-wide dissection of the regulatory constructs that control tissue-specific gene expression. We are working with a dataset that consists of about 75,000 images of Drosophila embryos, with about 6,000 genes profiled at five stage ranges during embryogenesis. Our goals are two-fold: One, to combine image analysis of gene expression patterns with sequence analysis of genomic regulatory regions and functional genomic data to explore the regulation of mRNA expression patterns during Drosophila embryogenesis; two, to develop tools for image processing and image labeling that will be widely applicable for problems of this type. We have developed an image-processing pipeline that identifies and registers the embryos in different images and extracts the pixels that correspond to mRNA staining. This pipeline includes a new method for extracting detailed stain patterns from the images, based on a supervised learning approach trained on ubiquitously expressed genes, as well as a novel approach for embryo segmentation that reliably deals with common issues, such as multiple overlapping embryos. In sum, our methods can automatically extract mRNA expression patterns for many thousands of embryos, with minimal human inspection. Furthermore, we have developed a protocol that uses hand-labeled images to characterize how accurately the registration algorithm aligns embryos; we are scaling up this effort using Amazon’s Mechanical Turk and the LabelMe image annotation toolbox. And we are developing a significantly extended version of LabelMe, called LabelLife, to enable faster and more flexible annotation of biological image datasets. Significant correlations between transcription factor binding data and the extracted gene expression patterns indicate that our image processing approach has the potential to uncover signals of gene regulation in a large image dataset. Following on this result, we are actively pursuing the integration of the image data with multiple genomics datasets, to uncover the regulatory programs governing spatial patterning of gene expression during Drosophila embryogenesis. We expect that our approach will transfer to many organisms, offering new insights into tissue-specific gene expression.

Abstract Author(s): Charlie Frogner, Chris Bristow, Tom Morgan, Stan Nikolov, Anna Ayuso, Tomaso Poggio, Manolis Kellis