Integrating Genetic Data sets to Predict TF-Gene Interactions

Irene Kaplow, Stanford University

We are working to integrate multiple data types to more accurately identify functional interactions between transcription factors (TFs) and their regulatory targets in mouse. Ideally, we would use ChIP-seq data, which measures TF-binding, to identify such regulatory interactions. However, ChIP-seq data is not currently available in most cell types and most organisms. The ImmGen Consortium recently released a genome-wide expression data set from over 200 blood cell types in mouse. In addition, the mouseENCODE consortium recently released DNase hypersensitivity data that reveals where chromatin is open in multiple blood cell types. In this project, we are combining DNAase hypersensitivity and expression profiles of genes in multiple cell types to gain a better understanding of regulatory relationships. In particular, correlated expression profiles of a TF-target pair can be informative of regulatory interactions; however, distinguishing informative from spurious correlations is a major challenge. To this end, we are using DNase hypersensitivity data to construct priors about possible interactions. We are developing a linear model that uses TF expression to predict the expression of groups of co-expressed genes (a gene module) but uses the prior information to help determine the likelihood of each regulatory interaction. We hope to discover novel TF-gene regulatory relationships and gain a better understanding of how to interpret DNase hypersensitivity data. Future work involves integrating other data sets, such as ChIP-seq data from a few blood cell types in mouse and possibly chromatin modification data, and applying this method to other organisms where gene expression and DNase hypersensitivity data are available.

Abstract Author(s): Irene Kaplow, Sara Mostafavi, and Daphne Koller