Reference: Keles S, et al. (2004) Regulatory motif finding by logic regression. Bioinformatics 20(16):2799-811

Reference Help

Abstract


MOTIVATION: Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although many computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding TFBSs and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression. LogicMotif has two steps: First, potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be used in this step when the genes of interest can be divided into groups such as up-and downregulated. For this step, we also develop a simple univariate regression and extension method MFURE to extract candidate TFBSs from a large number of genes in the availability of microarray gene expression data. MFURE provides an alternative method for this step when partitioning of the genes into disjoint groups is not preferred. This first step aims to identify individual sites within gene groups of interest or sites that are correlated with the gene expression outcome. In the second step, logic regression is used to build a predictive model of outcome of interest (either gene expression or up- and down-regulation) using these potential sites. This 2-fold approach creates a rich diverse set of potential binding sites in the first step and builds regression or classification models in the second step using logic regression that is particularly good at identifying complex interactions. RESULTS: LogicMotif is applied to two publicly available datasets. A genome-wide gene expression data set of Saccharomyces cerevisiae is used for validation. The regression models obtained are interpretable and the biological implications are in agreement with the known resuts. This analysis suggests that LogicMotif provides biologically more reasonable regression models than previous analysis of this dataset with standard linear regression methods. Another dataset of S.cerevisiae illustrates the use of LogicMotif in classification questions by building a model that discriminates between up- and down-regulated genes in iron copper deficiency. LogicMotif identifies an inductive and two repressor motifs in this dataset. The inductive motif matches the binding site of the transcription factor Aft1p that has a key role in regulation of the uptake process. One of the novel repressor sites is highly present in transcription control regions of FeS genes. This site could represent a TFBS for an unknown transcription factor involved in repression of genes encoding FeS proteins in iron deficiency. We establish the robustness of the method to the type of outcome variable used by considering both continuous and binary outcome variables for this dataset. Our results indicate that logic regression used in combination with cluster/group operating binding site identification methods or with our proposed method MFURE is a powerful and flexible alternative to linear regression based motif finding methods. AVAILABILITY: Source code for logic regression is freely available as a package of the R programming language by Ruczinski et al. (2003) and can be downloaded at http://bear.fhcrc.org/~ingor/logic/download/download.html. An R package for MFURE is available at http://www.stat.berkeley.edu/~sunduz/software.html.

Reference Type
Journal Article
Authors
Keles S, van der Laan MJ, Vulpe C
Primary Lit For
Additional Lit For
Review For

Interaction Annotations


Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.

Interactor Interactor Type Assay Annotation Action Modification Phenotype Source Reference

Gene Ontology Annotations


Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.

Gene Gene Ontology Term Qualifier Aspect Method Evidence Source Assigned On Annotation Extension Reference

Phenotype Annotations


Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details.

Gene Phenotype Experiment Type Mutant Information Strain Background Chemical Details Reference

Regulation Annotations


Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; to filter the table by a specific experiment type, type a keyword into the Filter box (for example, “microarray”); download this table as a .txt file using the Download button or click Analyze to further view and analyze the list of target genes using GO Term Finder, GO Slim Mapper, SPELL, or YeastMine.

Regulator Target Experiment Assay Construct Conditions Strain Background Reference