TY - JOUR
T1 - Comparative study of network-based prioritization of protein domains associated with human complex diseases
AU - Zhang, Wangshu
AU - Chen, Yong
AU - Jiang, Rui
N1 - Funding Information:
Acknowledgements This work was partly supported by the National Natural Science Foundation of China (Grant Nos. 60805010, 60928007, 60934004, and 10926027), Tsinghua National Laboratory for Information Science and Technology (TNLIST) Cross-discipline Foundation, the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 200800031009), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, China Postdoctoral Science Foundation (No. 20090450396), the Scientist Research Fund of Shandong Province (No. BS2009SW044), and the Doctor Research Fund from University of Jinan (No. XBS0914).
PY - 2010
Y1 - 2010
N2 - Domains are basic structural and functional unit of proteins, and, thus, exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Based on the assumption that deleterious nonsynonymous single nucleotide polymorphisms (nsSNPs) underlying human complex diseases may actually change structures of protein domains, affect functions of corresponding proteins, and finally result in these diseases, we compile a dataset that contains 1174 associations between 433 protein domains and 848 human disease phenotypes. With this dataset, we compare two approaches (guilt-by-association and correlation coefficient) that use a domain-domain interaction network and a phenotype similarity network to prioritize associations between candidate domains and human disease phenotypes. We implement these methods with three distance measures (direct neighbor, shortest path with Gaussian kernel, and diffusion kernel), demonstrate the effectiveness of these methods using three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and whole-genome scan), and evaluate the performance of these methods in terms of three criteria (mean rank ratio, precision, and AUC score). Results show that both methods can effectively prioritize domains that are associated with human diseases at the top of the candidate list, while the correlation coefficient approach can achieve slightly higher performance in most cases. Finally, taking the advantage that the correlation coefficient method does not require known disease-domain associations, we calculate a genome-wide landscape of associations between 4036 protein domains and 5080 human disease phenotypes using this method and offer a freely accessible web interface for this landscape.
AB - Domains are basic structural and functional unit of proteins, and, thus, exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Based on the assumption that deleterious nonsynonymous single nucleotide polymorphisms (nsSNPs) underlying human complex diseases may actually change structures of protein domains, affect functions of corresponding proteins, and finally result in these diseases, we compile a dataset that contains 1174 associations between 433 protein domains and 848 human disease phenotypes. With this dataset, we compare two approaches (guilt-by-association and correlation coefficient) that use a domain-domain interaction network and a phenotype similarity network to prioritize associations between candidate domains and human disease phenotypes. We implement these methods with three distance measures (direct neighbor, shortest path with Gaussian kernel, and diffusion kernel), demonstrate the effectiveness of these methods using three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and whole-genome scan), and evaluate the performance of these methods in terms of three criteria (mean rank ratio, precision, and AUC score). Results show that both methods can effectively prioritize domains that are associated with human diseases at the top of the candidate list, while the correlation coefficient approach can achieve slightly higher performance in most cases. Finally, taking the advantage that the correlation coefficient method does not require known disease-domain associations, we calculate a genome-wide landscape of associations between 4036 protein domains and 5080 human disease phenotypes using this method and offer a freely accessible web interface for this landscape.
UR - http://www.scopus.com/inward/record.url?scp=77953357801&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953357801&partnerID=8YFLogxK
U2 - 10.1007/s11460-010-0018-x
DO - 10.1007/s11460-010-0018-x
M3 - Article
AN - SCOPUS:77953357801
SN - 1673-3460
VL - 5
SP - 107
EP - 118
JO - Frontiers of Electrical and Electronic Engineering in China
JF - Frontiers of Electrical and Electronic Engineering in China
IS - 2
ER -