Integrative Image Analysis of Drosophila In-situ Hybridization Data

Charles Frogner, Massachusetts Institute of Technology

Photo of Charles Frogner

Understanding the spatio-temporal control of gene expression during development is one of the major challenges in developmental biology, but image processing and genome analysis have traditionally remained separate. To address this challenge, we have sought to combine image analysis of gene expression patterns with sequence analysis of genomic data to explore the regulation of mRNA expression patterns during Drosophila embryogenesis. We developed an image-processing pipeline and applied it to ~75,000 images of Drosophila embryos encompassing ~6,000 genes profiled at five stage ranges during embryogenesis. The pipeline registers embryos and extracts the pixels that correspond to mRNA staining. To characterize the accuracy of each stage of our image-processing pipeline, we developed a protocol that uses images hand- labeled with contours for evaluating segmentation. To acquire a large corpus of manually labeled images, we coupled the open-source LabelMe image annotation tool with Amazon’s Mechanical Turk. We are additionally developing a significantly extended version of LabelMe, called LabelLife. LabelLife will enable rapid annotation of images, parts of images and whole image sets with terms taken from controlled vocabularies, and will incorporate machine learning to predict annotations where possible. We further validated our image-processing approach by demonstrating correlations between transcription factor binding data and the extracted gene expression patterns. Following on this result, we are actively pursuing the integration of the image data with multiple genomics datasets, to uncover the regulatory programs governing spatio-temporal patterning of gene expression during Drosophila embryogenesis.

Abstract Author(s): Charlie Frogner, Chris Bristow, Stanislav Nikolov, Anna Ayuso, Tom Morgan, Lorenzo Rosasco, Tomaso Poggio, Manolis Kellis