# Non-negative matrix factorization

## General

• Original data must be non-negative
• ${\displaystyle X_{m,n}\approx W_{m,k}H_{k,n}}$ for K latent variables << min(m,n)
• W is a tall matrix, H is the wide matrix
• For image analysis, W is the "basis images", like the topic centroid
• Label is based on the H matrix
• Works well with images since pixels are always non-negative

## Compare vs PCA

• PCA can have negative values
• Items are in the rows
• Topics are linear combinations of words, documents are linear combinations of topics
• sparse ( non-smooth NMF)

## Compare vs K-Means

• K means are bad at unbalanced problems
• K means implies K centroids mu_i where you minimize the cost function
• Class membership matrix, one hot encoded