TY - JOUR
T1 - Probabilistic non-negative matrix factorization
T2 - Theory and application to microarray data analysis
AU - Bayar, Belhassen
AU - Bouaynaya, Nidhal
AU - Shterenberg, Roman
N1 - Funding Information:
This project is supported by Award Number R01GM096191 from the National Institute Of General Medical Sciences (NIH/NIGMS). The content is solely the responsibility of the authors and does not necessarily represent the o±cial views of the National Institute Of General Medical Sciences or the National Institutes of Health.
PY - 2014/2
Y1 - 2014/2
N2 - Non-negative matrix factorization (NMF) has proven to be a useful decomposition technique for multivariate data, where the non-negativity constraint is necessary to have a meaningful physical interpretation. NMF reduces the dimensionality of non-negative data by decomposing it into two smaller non-negative factors with physical interpretation for class discovery. The NMF algorithm, however, assumes a deterministic framework. In particular, the effect of the data noise on the stability of the factorization and the convergence of the algorithm are unknown. Collected data, on the other hand, is stochastic in nature due to measurement noise and sometimes inherent variability in the physical process. This paper presents new theoretical and applied developments to the problem of non-negative matrix factorization (NMF). First, we generalize the deterministic NMF algorithm to include a general class of update rules that converges towards an optimal non-negative factorization. Second, we extend the NMF framework to the probabilistic case (PNMF). We show that the Maximum a posteriori (MAP) estimate of the non-negative factors is the solution to a weighted regularized non-negative matrix factorization problem. We subsequently derive update rules that converge towards an optimal solution. Third, we apply the PNMF to cluster and classify DNA microarrays data. The proposed PNMF is shown to outperform the deterministic NMF and the sparse NMF algorithms in clustering stability and classification accuracy.
AB - Non-negative matrix factorization (NMF) has proven to be a useful decomposition technique for multivariate data, where the non-negativity constraint is necessary to have a meaningful physical interpretation. NMF reduces the dimensionality of non-negative data by decomposing it into two smaller non-negative factors with physical interpretation for class discovery. The NMF algorithm, however, assumes a deterministic framework. In particular, the effect of the data noise on the stability of the factorization and the convergence of the algorithm are unknown. Collected data, on the other hand, is stochastic in nature due to measurement noise and sometimes inherent variability in the physical process. This paper presents new theoretical and applied developments to the problem of non-negative matrix factorization (NMF). First, we generalize the deterministic NMF algorithm to include a general class of update rules that converges towards an optimal non-negative factorization. Second, we extend the NMF framework to the probabilistic case (PNMF). We show that the Maximum a posteriori (MAP) estimate of the non-negative factors is the solution to a weighted regularized non-negative matrix factorization problem. We subsequently derive update rules that converge towards an optimal solution. Third, we apply the PNMF to cluster and classify DNA microarrays data. The proposed PNMF is shown to outperform the deterministic NMF and the sparse NMF algorithms in clustering stability and classification accuracy.
UR - http://www.scopus.com/inward/record.url?scp=84894903821&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894903821&partnerID=8YFLogxK
U2 - 10.1142/S0219720014500012
DO - 10.1142/S0219720014500012
M3 - Article
C2 - 24467759
AN - SCOPUS:84894903821
SN - 0219-7200
VL - 12
JO - Journal of Bioinformatics and Computational Biology
JF - Journal of Bioinformatics and Computational Biology
IS - 1
M1 - 1450001
ER -