AMMAI reading and notes: [3/11] Probabilistic latent semantic indexing, T. Hofmann

[3/11] Probabilistic latent semantic indexing, T. Hofmann

A. Summary (mostly come from from ABSTRACT)

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.

Fitted from a training corpus of text do cuments by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain-specific synonymy

as well as with polysemous words.

In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model.

Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI.

In particular, the combination of models with different dimensionalities has proven to be advantageous.

B. Note

LSA
  document->concept->word
  SVD(singular value secomposition)
pLSA
  EM algorithm

Posted in L3, lecture notes, must read

0 Response to "[3/11] Probabilistic latent semantic indexing, T. Hofmann"

張貼留言

訂閱：張貼留言 (Atom)

AMMAI reading and notes

[3/11] Probabilistic latent semantic indexing, T. Hofmann

0 Response to "[3/11] Probabilistic latent semantic indexing, T. Hofmann"

張貼留言

About Me

Followers

Blog Archive

Labels