TY - GEN
T1 - Clustering gene expression data using probabilistic non-negative matrix factorization
AU - Bayar, Belhassen
AU - Bouaynaya, Nidhal
AU - Shterenberg, Roman
PY - 2011
Y1 - 2011
N2 - Non-negative matrix factorization (NMF) has proven to be a useful decomposition for multivariate data. Specifically, NMF appears to have advantages over other clustering methods, such as hierarchical clustering, for identification of distinct molecular patterns in gene expression profiles. The NMF algorithm, however, is deterministic. In particular, it does not take into account the noisy nature of the measured genomic signals. In this paper, we extend the NMF algorithm to the probabilistic case, where the data is viewed as a stochastic process. We show that the probabilistic NMF can be viewed as a weighted regularized matrix factorization problem, and derive the corresponding update rules. Our simulation results show that the probabilistic non-negative matrix factorization (PNMF) algorithm is more accurate and more robust than its deterministic homologue in clustering cancer subtypes in a leukemia microarray dataset.
AB - Non-negative matrix factorization (NMF) has proven to be a useful decomposition for multivariate data. Specifically, NMF appears to have advantages over other clustering methods, such as hierarchical clustering, for identification of distinct molecular patterns in gene expression profiles. The NMF algorithm, however, is deterministic. In particular, it does not take into account the noisy nature of the measured genomic signals. In this paper, we extend the NMF algorithm to the probabilistic case, where the data is viewed as a stochastic process. We show that the probabilistic NMF can be viewed as a weighted regularized matrix factorization problem, and derive the corresponding update rules. Our simulation results show that the probabilistic non-negative matrix factorization (PNMF) algorithm is more accurate and more robust than its deterministic homologue in clustering cancer subtypes in a leukemia microarray dataset.
UR - http://www.scopus.com/inward/record.url?scp=84863673831&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863673831&partnerID=8YFLogxK
U2 - 10.1109/gensips.2011.6169465
DO - 10.1109/gensips.2011.6169465
M3 - Conference contribution
AN - SCOPUS:84863673831
SN - 9781467304900
T3 - Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics
SP - 143
EP - 146
BT - Proceedings 2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11
PB - IEEE Computer Society
T2 - 2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11
Y2 - 4 December 2011 through 6 December 2011
ER -