토픽모델링, LDA(Latent Dirichlet Allocation)

April 4, 2019    Bayesian NLP

  • LDA(Latent Dirichlet Allocaton)의 posterior proability와 아래와 같이 collapsed gibbs sampling에 의해 변형된 posterior proability의 유도 과정을 살펴보겠습니다.
\[\begin{align*} P(\boldsymbol{Z}, \boldsymbol{W};\alpha,\beta) & \propto { \color{blue} \frac{ \ { n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k}{\sum_{k=1}^K { n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k } \ }\times { \color{green} \frac{ \ {n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v}{\sum_{v=1}^V { n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v } } \\ \\ & \propto {\color{blue} \text{문서에 대한 토픽 가중치} \ } \times {\color{green}\text{토픽에 대한 단어 가중치} \ } \end{align*}\]
  • collapsed gibbs sampling에 대한 내용은 여기에 참조되어 있습니다.


LDA

  • $K$: 토픽 수

  • $D$: 문서 수

  • $N$: 특정 문서의 단어 수

  • $w_{d,n}$: 특정 문서 d에서 n번째 단어로 관측된 값

  • $z_{d,n}$ topic: 특정 문서 d에서 n번째 단어 대한 topic assignment

    • $ z_{d,n}$ 는 Multinomial($\theta_d$)에서 샘플링 된 것입니다. ($\theta_d$: Multinomial분포의 모수)
    • Multinomial($\theta_d$)는 한번만 시행(샘플)된 것으로 Categorical($\theta_d$)와 동일합니다.
    • $\theta_d =[\theta_{d,1}, \theta_{d,2}, …,\theta_{d,K}] \in \mathbb{R}^K, (\sum_k \theta_{d,k}=1, \theta_{d,k} \geq0 , \ k: \text{topic index})$는 $d$번째 문서의 $\text{Dirichlet}_d(\alpha)$에서 샘플링된 것입니다.
    • $\alpha= [\alpha_1,\alpha_2, …, \alpha_K]$, symmetry할 경우 $\alpha_1=\alpha_2=\cdot\cdot\cdot= \alpha_K$
  • $w_{d,n} $ word: 특정 문서 d에서 n번째 단어

    • $ w_{d,n}$ 는 Multinomial($\phi_k$)에서 샘플링 된 것입니다. ($\phi_k$: Multinomial분포의 모수)
    • Multinomial($\phi_k$)는 한번만 시행(샘플)된 것으로 Categorical($\phi_k$)와 동일합니다.
    • $\phi_k =[\phi_{k,1},\phi_{k,2},…,\phi_{k,V}] \in \mathbb{R}^V, (\sum_i\phi_{k,v}=1, \phi_{v,i} \geq0 , \ v: \text{Vocabulary index})$는 $k$번째 토픽의 $\text{Dirichlet}_k(\beta)$서 샘플링된 것입니다.
    • $\beta= [\beta_1,\beta_2, …, \beta_V]$, symmetry할 경우 $\beta_1=\beta_2=\cdot\cdot\cdot= \beta_V$


  • Posterior Proability
\[\begin{align*} P(\boldsymbol{W}, \boldsymbol{Z}, \boldsymbol{\theta}, \boldsymbol{\phi};\alpha,\beta) = \prod_{i=k}^K P(\phi_k;\beta) \prod_{d=1}^D P(\theta_d;\alpha) \prod_{n=1}^N P(z_{d,n}\mid\theta_d)P(w_{d,n}\mid \phi_{z_{d,n} \ }) \\ \end{align*}\]
  • $\boldsymbol{Z}$에 대해서 collapsed gibbs sampler 진행 ($\boldsymbol{\theta}, \boldsymbol{\phi} $에 대해서 collapsed)

  • $\boldsymbol{Z}$를 알게되면 $\boldsymbol{\theta}, \boldsymbol{\phi}$ 도 알 수 있음


\[\begin{align*} P(\boldsymbol{Z}, \boldsymbol{W};\alpha,\beta) &= \int_{\boldsymbol{\theta} \ } \int_{\boldsymbol{\phi} \ } P(\boldsymbol{W}, \boldsymbol{Z}, \boldsymbol{\theta}, \boldsymbol{\phi};\alpha,\beta) \, d\boldsymbol{\phi} \, d\boldsymbol{\theta} \\ &= {\color{blue}[\int_{\boldsymbol{\phi} \ } \prod_{k=1}^K P(\phi_k;\beta) \prod_{d=1}^D \prod_{n=1}^N P(w_{d,n}\mid\phi_{z_{d,n} \ }) \, d\boldsymbol{\phi}] } {\color{green}[ \int_{\boldsymbol{\theta} \ } \prod_{d=1}^D P(\theta_d;\alpha) \prod_{n=1}^N P(z_{d,n}\mid\theta_d) \, d\boldsymbol{\theta}]} \quad \cdot\cdot\cdot \quad (1) \end{align*}\]
  • \(\boldsymbol{\theta} = \{\theta\}^{D}_{d=1} , \ \boldsymbol{\phi} = \{\phi\}^{K}_{k=1}\)는 서로 독립적인 관계를 가지고 있기 때문에, 분리시켜서 생각해 볼수 있습니다.


한 문서에 대한 토픽 분포 ($\boldsymbol{\theta}$)

\[\begin{align*} \int_{\boldsymbol{\theta} \ } \prod_{d=1}^D P(\theta_d;\alpha) \prod_{n=1}^N P(z_{d,n}\mid\theta_d) \, d\boldsymbol{\theta} &= \int_{\theta_d} \prod_{d=1}^D P(\theta_d;\alpha) \prod_{n=1}^N P(z_{d,n}\mid\theta_d) \, d\theta_d \\ \end{align*}\]


  • 특정 한 문서에 대해서만 생각하다면,
\[\begin{align} &\int_{\theta_d} P(\theta_d;\alpha) \prod_{n=1}^N P(z_{d,n}\mid\theta_d) \, d\theta_d \\ &= \int_{\theta_d} \frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^K \theta_{d,k}^{\alpha_k - 1} \prod_{n=1}^N P(z_{d,n}\mid\theta_d) \, d\theta_d, \quad \quad \cdot\cdot\cdot \quad (2) \\ &\because f(\theta_1, \cdots, \theta_K; \alpha_1, \cdots, \alpha_K) = \frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^k \theta_k ^{\alpha_k - 1}, \quad f:\text{ dirichlet pdf} \\ \end{align}\]


  • 다음으로, $P(z_{d,n} \mid \theta_d)$의 pdf를 구해보겠습니다.
  • $n^{k}_{d,(\cdot)}$ : $d$번째 문서에서 $k^{th}$ topic으로 할당된 단어들의 수,
    • $(\cdot)$: $d$번째 문서에서 $k^{th}$ topic으로 할당된 모든 단어 인덱스
\[\begin{align} \prod_{n=1}^N P(z_{d,n}\mid\theta_d) & = \prod^{N}_{n=1} \prod^{K}_{k=1}\theta_{d,k}^{[z_{d,n}=k]}, \quad [z_{d,n}=k] \text{ is } 1 \ \ \text{if} \ \ z_{d,n}=k ,\ 0 \ \text{ otherwise} \\ & \because f(z_{d,n} \mid \theta_d) = \prod^{K}_{k=1} \theta^{[z_{d,n}=k]}_i, \quad z_{d,n} \sim \text{Categorical}(\theta_d), \quad f: \text{Categorical pdf}\\ & = \prod^{K}_{k=1}\theta_{d,k}^{n_{d,(\cdot)}^k} , \quad \cdot\cdot\cdot \quad (3) \\ \\ \end{align}\]


  • 식(2)에 식(3)결과를 대입하면,

    \[\begin{align*} &= \int_{\theta_d} \frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^K \theta_{d,k}^{\alpha_k - 1} \prod^{K}_{k=1}\theta_{d,k}^{n_{d,(\cdot)}^k}\, d\theta_d \\ &= \int_{\theta_d} \frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^K \theta_{d,k}^{n_{d,(\cdot)}^k+\alpha_k - 1}\, d\theta_d \quad \cdot\cdot\cdot \quad (4) \\ \end{align*}\]


  • 식(4)의 constant term을 새로운 dirichlet($n_{d,(\cdot)}^k+\alpha_k$)를 따르도록 변형해 줍니다.

    • $\theta_{d,k} \sim Dir(n_{d,(\cdot)}^k+\alpha_k)$
    \[\begin{align*} &=\frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)}\cdot \frac{\prod_{k=1}^K \Gamma(n_{d,(\cdot)}^k+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{d,(\cdot)}^k+\alpha_k \right)} \int_{\theta_d} \frac{\Gamma\left (\sum_{k=1}^K n_{d,(\cdot)}^k+\alpha_k \right)}{\prod_{k=1}^K \Gamma(n_{d,(\cdot)}^k+\alpha_k)} \prod_{k=1}^K\theta_{d,k}^{n_{d,(\cdot)}^k+\alpha_k - 1}\, d\theta_d\\ \end{align*}\]


  • 그리고 $\theta $에 대한 dirichlet확률분포의 합은 1이 되는 것을 이용합니다.
\[\begin{align*} &=\frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)}\cdot \frac{\prod_{k=1}^K \Gamma(n_{d,(\cdot)}^k+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{d,(\cdot)}^k+\alpha_k \right)} \quad \cdot\cdot\cdot (5) &\because \int_{\theta_d} \frac{\Gamma\left (\sum_{k=1}^K n_{d,(\cdot)}^k+\alpha_k \right)}{\prod_{k=1}^K \Gamma(n_{d,(\cdot)}^k+\alpha_k)} \prod_{k=1}^K\theta_{d,k}^{n_{d,(\cdot)}^k+\alpha_k - 1}\, d\theta_d =1\\ \end{align*}\]


  • 식(5)는 한 문서에 대한 결과로, 원래 식에 따라 모든 문서에 적용하면,
\[\prod_{d=1}^D \frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)}\cdot \frac{\prod_{k=1}^K \Gamma(n_{d,(\cdot)}^k+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{d,(\cdot)}^k+\alpha_k \right)} \quad \cdot\cdot\cdot (6)\]


한 토픽에 대한 단어의 분포($\boldsymbol{\phi}$)

\[\begin{align*} &\int_{\boldsymbol{\phi} \ } \prod_{k=1}^K P(\phi_k;\beta) \prod_{d=1}^D \prod_{n=1}^N P(w_{d,n}\mid\phi_{z_{d,n} \ }) \, d\boldsymbol{\phi}\\ &= \prod_{k=1}^K \int_{\phi_i}P(\phi_k;\beta) \prod_{d=1}^D \prod_{n=1}^N P(w_{d,n}\mid\phi_{z_{d,n} \ }) \, d\phi_i\\ \end{align*}\]


  • 특정 한 토픽에 대해서만 생각하다면,
\[\begin{align*} & \int_{\phi_k}P(\phi_k;\beta) \prod_{d=1}^D \prod_{n=1}^N P(w_{d,n}\mid\phi_{z_{d,n} \ }) \, d\phi_k\\ &=\int_{\phi_k}\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)} \prod^{V}_{v=1} \phi ^{\beta_v-1}_{k,v}\prod_{d=1}^D \prod_{n=1}^N P(w_{d,n}\mid\phi_{z_{d,n} \ }) \, d\phi_k \quad \quad \cdot\cdot\cdot \quad (7) \\ &\because f(\phi_1, \cdots, \phi_V; \beta_1, \cdots, \beta_V) = \frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)} \prod_{v=1}^V \phi_{k,v} ^{\beta_v - 1}, \quad f:\text{ dirichlet pdf} \\ \end{align*}\]


  • 다음으로, $P(w_{d,n} \mid \phi_{z_{d,n} \ })$의 pdf를 구해보겠습니다.
  • $n^{k}_{(\cdot),v}$ : $v$번째 단어가 $k^{th}$ topic으로 할당된 수
    • $(\cdot)$: 모든 문서 인덱스
\[\begin{align} \prod_{d=1}^D \prod_{n=1}^N P(w_{d,n}\mid\phi_{z_{d,n} \ }) & = \prod^{D}_{d=1} \prod^{N}_{n=1}\prod^{V}_{v=1}\phi_{z_{d,n},v}^{[w_{d,n}=v]} , \quad [w_{d,n}=v] \text{ is } 1 \ \ \text{if} \ \ w_{d,n}=v ,\ 0 \ \text{ otherwise} \\ & \because f(w_{d,n} \mid \phi_{z_{d,n} \ }) = \prod^{V}_{v=1} \phi^{[w_{d,n}=v]}_{z_{d,n},v}, \quad w_{d,n} \sim \text{Categorical}(\phi_{z_{d,n} \ }), \quad f: \text{Categorical pdf}\\ & = \prod_{d=1}^D \prod_{n=1}^N\phi_{ \ {z_{d,n} \ },v}^{n_{(\cdot),v}^k} \\ & = \prod_{v=1}^V \phi_{ \ {z_{d,n} \ },v}^{n_{(\cdot),v}^k} , \quad \cdot\cdot\cdot \quad (8) \\ \\ \end{align}\]


  • 식(7)에 식(8)결과를 대입하면,
\[\begin{align} &=\int_{\phi_k}\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)} \prod^{V}_{v=1} \phi ^{\beta_v-1}_{k,v}\prod_{v=1}^V \phi_{ \ {z_{d,n} \ },v}^{n_{(\cdot),v}^k} \, \, d\phi_k \\ &=\int_{\phi_k}\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)} \prod^{V}_{v=1} \phi ^{\beta_v-1}_{k,v}\prod_{v=1}^V \phi_{ \ {k},v}^{n_{(\cdot),v}^k} \, \, d\phi_k \\ &=\int_{\phi_k}\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)} \prod^{V}_{v=1} \phi ^{n_{(\cdot),v}^k+\beta_v-1}_{k,v} \, d\phi_k \quad \cdot\cdot\cdot \quad (9) \\ \end{align}\]


  • 식(9)의 constant term을 새로운 dirichlet($n_{(\cdot),v}^k+\beta_k$)를 따르도록 변형해 줍니다.
    • $\phi_{k,v} \sim Dir(n_{(\cdot),v}^k+\beta_v)$
\[\begin{align*} &=\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)}\cdot \frac{\prod_{v=1}^V \Gamma(n_{(\cdot),v}^k+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),v}^k+\beta_v \right)} \int_{\phi_k}\frac{\Gamma\left (\sum_{v=1}^V n_{(\cdot),v}^k+\beta_v \right)}{\prod_{k=1}^K \Gamma(n_{(\cdot),v}^k+\beta_v)}\prod_{v=1}^V\phi_{k,v}^{n_{(\cdot),v}^k+\beta_v - 1}\, d\phi_k\\ \end{align*}\]


  • 그리고 $\phi$ 에 대한 dirichlet확률분포의 합은 1이 되는 것을 이용합니다.
\[=\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)}\cdot \frac{\prod_{v=1}^V \Gamma(n_{(\cdot),v}^k+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),v}^k+\beta_v \right)} \quad \cdot\cdot\cdot (10)\quad \quad \because \int_{\phi_k}\frac{\Gamma\left (\sum_{v=1}^V n_{(\cdot),v}^k+\beta_v \right)}{\prod_{k=1}^K \Gamma(n_{(\cdot),v}^k+\beta_v)}\prod_{v=1}^V\phi_{k,v}^{n_{(\cdot),v}^k+\beta_v - 1}\, d\phi_k=1\\\]


  • 식(10)는 한 토픽에 대한 결과로, 원래 식에 따라 모든 토픽에 적용하면,
\[\prod^{K}_{k=1}\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)}\cdot \frac{\prod_{v=1}^V \Gamma(n_{(\cdot),v}^k+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),v}^k+\beta_v \right)} \quad \cdot\cdot\cdot (11)\]


  • 최종적으로, 식 (1)을 (6)과 (11)로 정리하면,
\[\begin{align*} P(\boldsymbol{Z}, \boldsymbol{W};\alpha,\beta) &= \int_{\boldsymbol{\theta} \ } \int_{\boldsymbol{\phi} \ } P(\boldsymbol{W}, \boldsymbol{Z}, \boldsymbol{\theta}, \boldsymbol{\phi};\alpha,\beta) \, d\boldsymbol{\phi} \, d\boldsymbol{\theta} \\ &= [\int_{\boldsymbol{\phi} \ } \prod_{k=1}^K P(\phi_k;\beta) \prod_{d=1}^D \prod_{n=1}^N P(w_{d,n}\mid\phi_{z_{d,n} \ }) \, d\boldsymbol{\phi}] [ \int_{\boldsymbol{\theta} \ } \prod_{d=1}^D P(\theta_d;\alpha) \prod_{n=1}^N P(z_{d,n}\mid\theta_d) \, d\boldsymbol{\theta}] \\ &= \prod_{d=1}^D \frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)}\cdot \frac{\prod_{k=1}^K \Gamma(n_{d,(\cdot)}^k+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{d,(\cdot)}^k+\alpha_k \right)} \times\prod^{K}_{k=1}\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)}\cdot \frac{\prod_{v=1}^V \Gamma(n_{(\cdot),v}^k+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),v}^k+\beta_v \right)} \\ \end{align*}\]


  • prior(a constant)는 topic assignments에 영향을 주지 않는다.

    • prior에만 영향을 받는 부분 제거
    \[\definecolor{red}{RGB}{255,0,0} \definecolor{green}{RGB}{5,196,2} \definecolor{blue}{RGB}{0,0,255} \definecolor{energy}{RGB}{114,0,172} \definecolor{freq}{RGB}{45,177,93} \definecolor{spin}{RGB}{251,0,29} \definecolor{signal}{RGB}{18,110,213} \definecolor{circle}{RGB}{217,86,16} \definecolor{average}{RGB}{203,23,206} \begin{align*} &= \prod_{d=1}^D {\color{average}{\frac{\Gamma\left (\sum_{k=1}^K \alpha_k \right)}{\prod_{k=1}^K \Gamma(\alpha_k)} \ }}\cdot \frac{\prod_{k=1}^K \Gamma(n_{d,(\cdot)}^k+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{d,(\cdot)}^k+\alpha_k \right)} \times\prod^{K}_{k=1}{\color{average}{\frac{\Gamma\left (\sum_{v=1}^V \beta_v \right)}{\prod_{v=1}^V \Gamma(\beta_v)} \ }}\cdot \frac{\prod_{v=1}^V \Gamma(n_{(\cdot),v}^k+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),v}^k+\beta_v \right)} \\ &\propto \prod_{d=1}^D \frac{\prod_{k=1}^K \Gamma(n_{d,(\cdot)}^k+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{d,(\cdot)}^k+\alpha_k \right)} \times\prod^{K}_{k=1} \frac{\prod_{v=1}^V \Gamma(n_{(\cdot),v}^k+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),v}^k+\beta_v \right)} \\ \end{align*}\]


  • collpased gibbs sampling 특성에 따라 목적식을 간단히 해보겠습니다.

    • 특정 하나의 단어의 토픽 assignment $z_{\hat{d},\hat{n} \ }$를 제외하고, 나머지 토픽 assignment 단어 $z_{-(\hat{d},\hat{n})}$ 들을 구분해 보겠습니다(gibbs sampling의 conditional probability)
    • 식 (12)의 $\prod^{D}_{d=1}$은 $\hat{d}$번째 문서만 고려되기 때문에 지워져도 됩니다.
    • 식 (12)의 $\prod^{V}_{v=1}$은 vocabulary에서 $\hat{v}$번째 단어만 고려되기 때문에 지워져도 됩니다.
    • 식 (13)에서 앞으로 $\hat{k}$ 번째 topic assignment을 하기 위해 그렇지 않은 나머지 $k\neq\hat{k}$ topic assignment를 구분해줍니다.
    • $\color{blue}n_{d,(\cdot)}^{k-(d,n)}$ : $d$번째 문서에서 $n$번째 단어 한 개를 제외하고, $d$번째 문서의 단어들이 $k$ 번째 토픽으로 할당된 갯수를 의미합니다.
    • $\color{freq}n_{(\cdot),v}^{k-(d,n)}$ : $d$번째 문서에서 $n$번째 단어 한 개를 제외하고, 전체 corpus에서 $v$번째 단어들이 $k$ 번째 토픽으로 할당된 갯수를 의미합니다.
    • 식 (14)에서 $n^{(\hat{k})} = n^{(-\hat{k})} +1$ 의 성질을 이용해 변형해 줍니다.
    • 식 (15)에서 $\Gamma( n ) = (n-1)!$ 의 감마함수의 성질을 이용해 줍니다.
    • 식 (16)에서 현재 step에서 알고자하는 변수는 $\hat{k}$ 이므로 관련된 부분만 남길 수 있습니다.
\[\begin{align*} P(\boldsymbol{Z}, \boldsymbol{W};\alpha,\beta) &= P(z_{\hat{d},\hat{n} \ }=\hat{k}, \ z_{-(\hat{d},\hat{n})},\boldsymbol{W};\alpha,\beta )\\ &\propto {\color{red}\prod_{d=1}^D} \frac{\prod_{k=1}^K \Gamma(n_{\hat{d},(\cdot)}^k+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{\hat{d},(\cdot)}^k+\alpha_k \right)} \times\prod^{K}_{k=1} \frac{ \ {\color{red}\prod_{v=1}^V} \Gamma(n_{(\cdot),\hat{v} \ }^k+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),\hat{v} \ }^k+\beta_v \right)} \cdot \cdot \cdot (12)\\ &\propto \frac{\prod_{k=1}^K \Gamma(n_{\hat{d},(\cdot)}^k+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{\hat{d},(\cdot)}^k+\alpha_k \right)} \times\prod^{K}_{k=1} \frac{ \Gamma(n_{(\cdot),\hat{v} \ }^k+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),\hat{v} \ }^k+\beta_v \right)} \\ \\ &\propto \frac{\Gamma({\color{blue} n_{\hat{d},(\cdot)}^{\hat{k} \ }}+\alpha_k)}{\Gamma\left (\sum_{k=1}^K {\color{blue}n_{\hat{d},(\cdot)}^{\hat{k} \ }}+\alpha_k \right)} \frac{\prod_{k=1,k \neq \hat{k} \ }^K \Gamma(n_{\hat{d},(\cdot)}^{k-(\hat{d},\hat{n})}+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{\hat{d},(\cdot)}^{k-(\hat{d},\hat{n})}+\alpha_k \right)} \\ & \quad\quad \times \frac{\Gamma({\color{freq}n_{(\cdot),\hat{v} \ }^{\hat{k} \ }}+\beta_v)}{\Gamma\left (\sum_{v=1}^V {\color{freq} n_{(\cdot),\hat{v} \ }^{\hat{k} \ }}+\beta_v \right)} \prod^{K}_{k=1, k \neq \hat{k} \ } \frac{ \Gamma(n_{(\cdot),\hat{v} \ }^{k-(\hat{d},\hat{n})}+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),\hat{v} \ }^{k-(\hat{d},\hat{n})}+\beta_v \right)} \cdot \cdot \cdot (13) \\ \\ &= \frac{\Gamma({ \color{blue}n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k+{\color{blue}1})}{\Gamma\left (\sum_{k=1}^K {\color{blue} n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k +{\color{blue}1}\right)} \frac{\prod_{k=1,k \neq \hat{k} \ }^K \Gamma(n_{\hat{d},(\cdot)}^{k-(\hat{d},\hat{n})}+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{\hat{d},(\cdot)}^{k-(\hat{d},\hat{n})}+\alpha_k \right)} \quad \because n^{\hat{k} \ } = n^{\hat{k}-(\hat{d},\hat{n})} +1\\ & \quad\quad \times \frac{\Gamma({\color{freq}n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v+{\color{freq}1})}{\Gamma\left (\sum_{v=1}^V {\color{freq} n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v +{\color{freq}1} \right)} \prod^{K}_{k=1, k \neq \hat{k} \ } \frac{ \Gamma(n_{(\cdot),\hat{v} \ }^{k-(\hat{d},\hat{n})}+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),\hat{v} \ }^{k-(\hat{d},\hat{n})}+\beta_v \right)} \cdot \cdot \cdot (14) \\ \\ &= \frac{ \ { n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k}{\sum_{k=1}^K { n_{\hat{d},(\cdot)}^{(\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k } {\color{blue}\frac{\Gamma({ n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k)}{\Gamma\left (\sum_{k=1}^K { n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k\right)} \frac{\prod_{k=1,k \neq \hat{k} \ }^K \Gamma(n_{\hat{d},(\cdot)}^{k-(\hat{d},\hat{n})}+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{\hat{d},(\cdot)}^{k-(\hat{d},\hat{n})}+\alpha_k \right)} }\quad \because \Gamma( n+1 ) = \Gamma(n) \times n \\ & \quad\quad \times {\frac{ \ {n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v}{\sum_{v=1}^V { n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v } \color{freq} \frac{\Gamma({n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v)}{\Gamma\left (\sum_{v=1}^V { n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v \right)} \prod^{K}_{k=1, k \neq \hat{k} \ } \frac{ \Gamma(n_{(\cdot),\hat{v} \ }^{k-(\hat{d},\hat{n})}+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),\hat{v} \ }^{k-(\hat{d},\hat{n})}+\beta_v \right)} }\cdot \cdot \cdot (15) \\ \\ &= \frac{ \ { n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k}{\sum_{k=1}^K { n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k } {\color{blue} \frac{\prod_{k=1}^K \Gamma(n_{\hat{d},(\cdot)}^{k-(\hat{d},\hat{n})}+\alpha_k)}{\Gamma\left (\sum_{k=1}^K n_{\hat{d},(\cdot)}^{k-(\hat{d},\hat{n})}+\alpha_k \right)} }\\ & \quad\quad \times {\frac{ \ {n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v}{\sum_{v=1}^V { n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v } \color{freq} \prod^{K}_{k=1} \frac{ \Gamma(n_{(\cdot),\hat{v} \ }^{k-(\hat{d},\hat{n})}+\beta_v)}{\Gamma\left (\sum_{v=1}^V n_{(\cdot),\hat{v} \ }^{k-(\hat{d},\hat{n})}+\beta_v \right)} }\cdot \cdot \cdot (16) \\ \\ & \propto \frac{ \ { n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k}{\sum_{k=1}^K { n_{\hat{d},(\cdot)}^{\hat{k}-(\hat{d},\hat{n})} \ }+\alpha_k } \times {\frac{ \ {n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v}{\sum_{v=1}^V { n_{(\cdot),\hat{v} \ }^{\hat{k}-(\hat{d},\hat{n})} \ }+\beta_v } } \\ \end{align*}\]


Reference

https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation


DSBA