Non-negative matrix factorization

From Colettapedia

Jump to navigation Jump to search

General

Original data must be non-negative
$X_{m,n}\approx W_{m,k}H_{k,n}$ $X_{m,n}\approx W_{m,k}H_{k,n}$ for K latent variables << min(m,n)
- W is a tall matrix, H is the wide matrix
- For image analysis, W is the "basis images", like the topic centroid
Label is based on the H matrix
Works well with images since pixels are always non-negative

Compare vs PCA

PCA can have negative values
Items are in the rows
Topics are linear combinations of words, documents are linear combinations of topics
sparse ( non-smooth NMF)

Compare vs K-Means

K means are bad at unbalanced problems
K means implies K centroids mu_i where you minimize the cost function
Class membership matrix, one hot encoded

Retrieved from "https://chriscoletta.com/index.php?title=Non-negative_matrix_factorization&oldid=3764"