Reference: Lin JR, et al. (2012) Minimalist ensemble algorithms for genome-wide protein localization prediction. BMC Bioinformatics 13(1):157

Reference Help

Abstract

ABSTRACT: BACKGROUND: Computational prediction of protein subcellular localization can greatly help to elucidate itsfunctions. Despite the existence of dozens of protein localization prediction algorithms, theprediction accuracy and coverage are still low. Several ensemble algorithms have beenproposed to improve the prediction performance, which usually include as many as 10 ormore individual localization algorithms. However, their performance is still limited by therunning complexity and redundancy among individual prediction algorithms. RESULTS: This paper proposed a novel method for rational design of minimalist ensemble algorithmsfor practical genome-wide protein subcellular localization prediction. The algorithm is basedon combining a feature selection based filter and a logistic regression classifier. Using anovel concept of contribution scores, we analyzed issues of algorithm redundancy, consensusmistakes, and algorithm complementarity in designing ensemble algorithms. We applied theproposed minimalist logistic regression (LR) ensemble algorithm to two genome-widedatasets of Yeast and Human and compared its performance with current ensemblealgorithms. Experimental results showed that the minimalist ensemble algorithm can achievehigh prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemblealgorithms, which greatly reduces computational complexity and running time. It was foundthat the high performance ensemble algorithms are usually composed of the predictors thattogether cover most of available features. Compared to the best individual predictor, ourensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popularweighted voting based ensemble algorithms, our classifier-based ensemble algorithmsachieved much better performance without suffering from inclusion of too many individualpredictors CONCLUSIONS: We proposed a method for rational design of minimalist ensemble algorithms using featureselection and classifiers. The proposed minimalist ensemble algorithm based on logisticregression can achieve equal or better prediction performance while using only half or onethirdof individual predictors compared to other ensemble algorithms. The results alsosuggested that meta-predictors that take advantage of a variety of features by combiningindividual predictors tend to achieve the best performance. The LR ensemble server andrelated benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgibin/predict.cgi.

Reference Type
Journal Article
Authors
Lin JR, Mondal AM, Liu R, Hu J
Primary Lit For
Additional Lit For
Review For

Interaction Annotations

Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.

Interactor Interactor Type Assay Annotation Action Modification Phenotype Source Reference

Gene Ontology Annotations

Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.

Gene Gene Ontology Term Qualifier Aspect Method Evidence Source Assigned On Annotation Extension Reference

Phenotype Annotations

Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details.

Gene Phenotype Experiment Type Mutant Information Strain Background Chemical Details Reference

Regulation Annotations

Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; to filter the table by a specific experiment type, type a keyword into the Filter box (for example, “microarray”); download this table as a .txt file using the Download button or click Analyze to further view and analyze the list of target genes using GO Term Finder, GO Slim Mapper, SPELL, or YeastMine.

Regulator Target Experiment Assay Construct Conditions Strain Background Reference