TY - JOUR
T1 - Genome-wide discovery of missing genes in biological pathways of prokaryotes
AU - Chen, Yong
AU - Mao, Fenglou
AU - Li, Guojun
AU - Xu, Ying
N1 - Funding Information:
This work is supported by National Science Foundation (NSF/DBI-0542119, NSF/DBI-0542119004, NSF/DEB-0830024, NSF/DBI-0821263, DOE/4000063512), also National Institutes of Health (1R01GM075331 and 1R01GM081682) and a Distinguished Scholar grant from the Georgia Cancer Coalition. This work is also supported in part by grants (60673059, 60373025 and 10926027) from the National Science Foundation of China, the Taishan Scholar Fund from Shandong Province, and the State Scholarship Fund of China (20073020). We also thank the financial support from the China Postdoctoral Science Foundation (20090450396), the Scientist Research Fund of Shandong Province (BS2009SW044), and the Doctoral Research Fund from the University of Jinan (XBS0914). Finally, we thank all the CSBL colleagues for their comments on this work; and thank Greg Vatcher for helps in correcting language errors. The BioEnergy Science Center is a U.S. Department of Energy Bioenergy. Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science. This study was supported in part by (e.g., funds, resources, technical expertise) provided by the University of Georgia Research Computing Center, a partnership between the Office of the Vice President for Research and the Office of the Chief Information Officer. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12? issue=S1.
PY - 2011/2/25
Y1 - 2011/2/25
N2 - Background: Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called " missing gene" problem.Methods: We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the mapped pathway model. The basic idea of the algorithm is to identify genes in the target genome whose homologous genes share common operons with homologs of any mapped pathway genes in some reference genome, and to add such genes to the target pathway if their functions are consistent with the cellular function of the target pathway.Results: We have implemented this idea using a graph-theoretic approach and demonstrated the effectiveness of the algorithm on known pathways of E. coli in the KEGG database. On all KEGG pathways containing at least 5 genes, our method achieves an average of 60% positive predictive value (PPV) and the performance is increased with more seed genes added. Analysis shows that our method is highly robust.Conclusions: An effective method is presented to find missing genes in biological pathways of prokaryotes, which achieves high prediction reliability on E. coli at a genome level. Numerous missing genes are found to be related to knwon E. coli pathways, which can be further validated through biological experiments. Overall this method is robust and can be used for functional inference.
AB - Background: Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called " missing gene" problem.Methods: We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the mapped pathway model. The basic idea of the algorithm is to identify genes in the target genome whose homologous genes share common operons with homologs of any mapped pathway genes in some reference genome, and to add such genes to the target pathway if their functions are consistent with the cellular function of the target pathway.Results: We have implemented this idea using a graph-theoretic approach and demonstrated the effectiveness of the algorithm on known pathways of E. coli in the KEGG database. On all KEGG pathways containing at least 5 genes, our method achieves an average of 60% positive predictive value (PPV) and the performance is increased with more seed genes added. Analysis shows that our method is highly robust.Conclusions: An effective method is presented to find missing genes in biological pathways of prokaryotes, which achieves high prediction reliability on E. coli at a genome level. Numerous missing genes are found to be related to knwon E. coli pathways, which can be further validated through biological experiments. Overall this method is robust and can be used for functional inference.
UR - http://www.scopus.com/inward/record.url?scp=79951526289&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79951526289&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-12-S1-S1
DO - 10.1186/1471-2105-12-S1-S1
M3 - Article
C2 - 21342538
AN - SCOPUS:79951526289
SN - 1471-2105
VL - 12
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - SUPPL. 1
M1 - S1
ER -