AMMAI reading and notes: [3/11] Latent Dirichlet allocation, D. Blei, A. Ng, and M. Jordan.

[3/11] Latent Dirichlet allocation, D. Blei, A. Ng, and M. Jordan.

A. Summary (mostly come from from ABSTRACT)

This paper describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics.

Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document.

We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

B. Note

part1: corpus generative process model

LDA assumes the following generative process for each document w in a corpus D:
1. Choose N ~ Poisson(ξ).
2. Choose θ ~ Dir(α).
3. For each of the N words wn:
(a) Choose a topic zn ~ Multinomial(θ).
(b) Choose a word wn from p(wn | zn,β), a multinomial probability conditioned on the topic zn.

part2: Estimate parameter

Maximizing p(D | α,β) instead of maximizing p(α,β | D), because they are equal (by Bayesian theorem)

but p(w | α,β) is intractable, so they used a variational distribution to approximate γ,φ

Posted in L3, lecture notes, must read

0 Response to "[3/11] Latent Dirichlet allocation, D. Blei, A. Ng, and M. Jordan."

張貼留言

訂閱：張貼留言 (Atom)

AMMAI reading and notes

[3/11] Latent Dirichlet allocation, D. Blei, A. Ng, and M. Jordan.

0 Response to "[3/11] Latent Dirichlet allocation, D. Blei, A. Ng, and M. Jordan."

張貼留言

About Me

Followers

Blog Archive

Labels