Non-negative matrix factorization

From Colettapedia
Revision as of 18:33, 5 January 2021 by Colettace (talk | contribs) (Created page with "== General == * Original data must be non-negative * <math> X_{m,n} \approx W_{m,k} H_{k,n} </math> for K latent variables << min(m,n) ** W is a tall matrix, H is the wide ma...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

General

  • Original data must be non-negative
  • for K latent variables << min(m,n)
    • W is a tall matrix, H is the wide matrix
    • For image analysis, W is the "basis images", like the topic centroid
  • Label is based on the H matrix
  • Works well with images since pixels are always non-negative

Compare vs PCA

  • PCA can have negative values
  • Items are in the rows
  • Topics are linear combinations of words, documents are linear combinations of topics
  • sparse ( non-smooth NMF)


Compare vs K-Means

  • K means are bad at unbalanced problems
  • K means implies K centroids mu_i where you minimize the cost function
  • Class membership matrix, one hot encoded