TY - GEN
T1 - Exploring matrix factorization techniques for significant genes identification of microarray dataset
AU - Kong, Wei
AU - Mou, Xiaoyang
AU - Hu, Xiaohua
PY - 2010
Y1 - 2010
N2 - Unsupervised machine learning approaches are efficient analysis tools for DNA microarray technique which can accumulate hundreds of thousands of genes expression levels in a single experiment. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are explored to identify significant genes and related pathways in microarray gene expression dataset. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles. By combining the significant genes identified by both ICA and NMF, the simulation results show great efficient for finding underlying biological processes and related pathways in Alzheimer's disease (AD) and the activation patterns to AD phenotypes.
AB - Unsupervised machine learning approaches are efficient analysis tools for DNA microarray technique which can accumulate hundreds of thousands of genes expression levels in a single experiment. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are explored to identify significant genes and related pathways in microarray gene expression dataset. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles. By combining the significant genes identified by both ICA and NMF, the simulation results show great efficient for finding underlying biological processes and related pathways in Alzheimer's disease (AD) and the activation patterns to AD phenotypes.
UR - http://www.scopus.com/inward/record.url?scp=79952421689&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952421689&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2010.5706599
DO - 10.1109/BIBM.2010.5706599
M3 - Conference contribution
AN - SCOPUS:79952421689
SN - 9781424483075
T3 - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
SP - 401
EP - 405
BT - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
T2 - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
Y2 - 18 December 2010 through 21 December 2010
ER -