GAN(Generative Adversarial Network)

April 23, 2019    Domain Adaptation

GAN

  • Discriminative model(supervised): ($\mathbf{x}$,$y$)의 관계를 학습하여 $y$를 예측하는 모델
  • Generative model(unsupervised): 데이터 $\mathbf{x}$의 manifold를 잘 표현하는 latent random variable($\mathbf{z}$)을 구한 다음, 알고 싶은 target분포(e.g. 이미지 $\mathbf{x}$)를 예측하는 모델

  • GAN은 generative model의 특별한 케이스


Loss Function


\[\begin{align} \definecolor{red}{RGB}{255,0,0} \definecolor{green}{RGB}{0,255,0} \definecolor{blue}{RGB}{0,0,255} \definecolor{energy}{RGB}{114,0,172} \definecolor{freq}{RGB}{45,177,93} \definecolor{spin}{RGB}{251,0,29} \definecolor{signal}{RGB}{18,110,213} \definecolor{circle}{RGB}{217,86,16} \definecolor{average}{RGB}{203,23,206} {\color{red} \min_G} {\color{blue} \max_D }V({\color{blue}D},{\color{red}G}) & = E_{\mathbf{x}\sim p_{data}(\mathbf{x})}{\color{blue}[}\log {\color{blue}D}(\mathbf{x}){\color{blue}]} +E_{\mathbf{z}\sim p_{\mathbf{z}}(\mathbf{z})}{\color{red} [}{\color{blue} [}\log(1-{\color{blue}D}({\color{red}G}(\mathbf{z}))){\color{blue} ]}{\color{red} ]} \\ & = \int_\mathbf{x} p_{data}(\mathbf{x}) \log {\color{blue}D}(\mathbf{x}) dx + \int_\mathbf{z} p_{\mathbf{z}}(\mathbf{z}) {\color{red} [}{\color{blue} [}\log(1-{\color{blue}D}({\color{red}G}(\mathbf{z}))){\color{blue} ]}{\color{red} ]} dz \end{align}\]
  • $p_{data}(x)$: 데이터 $\mathbf{x}$에 대한 확률 분포
  • $p_{data}(z)$: latent random variables $\mathbf{z}$에 대한 확률 분포
  • $\mathbf{x}\sim p_{data}(\mathbf{x})$: $p_{data}(\mathbf{x})$분포에서 샘플링된 $\mathbf{x}$

  • $\mathbf{z}\sim p_{data}(\mathbf{z})$: $p_{\mathbf{z}}(\mathbf{z})$분포에서 샘플링된 $\mathbf{z}$

  • ${\color{blue}D}(\mathbf{x})$: real image로 판단할 확률 $(0 \leq D(\mathbf{x})\leq 1)$

  • ${\color{red}G}(\mathbf{z})$: $\mathbf{z}$(latent random variable)를 decoder(${G}$)를 통과시켜 생성시킨 fake image


  • Discriminator Optimizer
    • $\max_{\color{blue} D} \log {\color{blue}D}(\mathbf{x})$: discriminator가 real image를 real image로 판단할 확률을 최대화 (when ${\color{blue}D}(G(\mathbf{z})) \rightarrow 1$)
      • 실제이미지의 분포와 멀어지는 것을 방지한 Least Square GAN에서는 $({\color{blue}D}(G(\mathbf{z})) - 1)^2$
    • $\max_{\color{blue}D} log(1-{\color{blue}D}(G(\mathbf{z})))$: 생성된 fake image($G(\mathbf{z})$)가 real이미지로 판단할 확률을 최소화 (when ${\color{blue}D}(G(\mathbf{z})) \rightarrow 0$)
      • Least Square GAN에서는 ${\color{blue}D}(G(\mathbf{z}))^2$
      loss_D = -tf.reduce_mean(tf.log(discriminator(X)) + tf.log(1 - discriminator(G)))
    


  • Generator Optimizer
    • Discriminator의 parameter는 고정
    • $\min_{\color{red}G} log(1-D({\color{red}G}(\mathbf{z})))$: 생성된 fake image(${\color{red}G}(\mathbf{z})$)가 real이미지로 판단할 확률을 최대화 (when $D({\color{red}G}(\mathbf{z})) \rightarrow 1$)
    • $D({\color{red}G}(\mathbf{z}))$는 실제로 0에 시작하는데(생성된 fake image가 real image로 판단될 확률이 거의 0), 목적함수 $1- D({\color{red}G}(\mathbf{z}))$그 위치에서의 기울기는 굉장히 낮음(학습이 느림)
    • 학습 기울기를 키우기 위해, $\min_{\color{red}G} log(1-D({\color{red}G}(\mathbf{z})))$를 변형시켜 $D({\color{red}G}(\mathbf{z}))$가 $\color{freq}1$이 되도록 설정
      • Least Square GAN에서는 $(D({\color{red}G}(\mathbf{z}))-1)^2$
      loss_G = -tf.reduce_mean(tf.log(discriminator(G)))
    
\[\begin{align} E_{\mathbf{z}\sim p_{\mathbf{z}}(\mathbf{z})}[CE(y,\hat{y})] &= E_{\mathbf{z}\sim p_{\mathbf{z}}(\mathbf{z})}{\color{red}[}CE({\color{freq}1},D({\color{red}G}(\mathbf{z})){\color{red}]} \\ & = E_{\mathbf{z}\sim p_{\mathbf{z}}(\mathbf{z})}{\color{red}[}- \log D({\color{red}G}(\mathbf{z})){\color{red}]} \end{align}\]
  • Experiment
    • ConvLayer(4개)에서 Adam optimizer(beta1=0.5, beta2=0.999)를 사용하게 실험적으로 성능이 좋게 나옴


최적화를 위한 극값이 존재하는가?

\[\begin{align} {\color{red} \min_G} {\color{blue} \max_D }V({\color{blue}D},{\color{red}G}) & = E_{\mathbf{x}\sim p_{data}(\mathbf{x})}{\color{blue}[}\log {\color{blue}D}(\mathbf{x}){\color{blue}]} +E_{\mathbf{z}\sim p_{\mathbf{z}}(\mathbf{z})}{\color{red} [}{\color{blue} [}\log(1-{\color{blue}D}({\color{red}G}(\mathbf{z}))){\color{blue} ]}{\color{red} ]} \\ & = \int_\mathbf{x} p_{data}(\mathbf{x}) \log {\color{blue}D}(\mathbf{x}) d\mathbf{x} + \int_\mathbf{z} p_{\mathbf{z}}(\mathbf{z}) {\color{red} [}{\color{blue} [}\log(1-{\color{blue}D}({\color{red}G}(\mathbf{z}))){\color{blue} ]}{\color{red} ]} d\mathbf{z} \\ & = \int_\mathbf{x} p_{data}(\mathbf{x}) \log {\color{blue}D}(\mathbf{x}) d\mathbf{x} + \int_\mathbf{x} p_{\mathbf{g}}(\mathbf{x}){\color{blue} [}\log(1-{\color{blue}D}(\mathbf{x})){\color{blue} ]} d\mathbf{x} ,\quad \because {\color{red}G} \text{ fixed} \\ 0 & = \frac{d}{d{\color{blue}D}}[p_{data}(\mathbf{x}) \log {\color{blue}D}(\mathbf{x})+p_{\mathbf{g}}(\mathbf{x}){\color{blue} [}\log(1-{\color{blue}D}(\mathbf{x})){\color{blue} ]}] \\ 0 & = \frac{p_{data}(\mathbf{x})}{D(\mathbf{x})} - \frac{p_{\mathbf{g}}(\mathbf{x})}{1-D(\mathbf{x})}\\ 0 & = \frac{p_{data}(\mathbf{x})(1-D(\mathbf{x}))-p_{\mathbf{g}}(\mathbf{x})D(\mathbf{x})}{D(\mathbf{x})(1-D(\mathbf{x}))} \\ 0 & = p_{data}(\mathbf{x})(1-D(\mathbf{x}))-p_{\mathbf{g}}(\mathbf{x})D(\mathbf{x}) \\ D(\mathbf{x}) & = \frac{p_{data}(\mathbf{x})}{p_{data}(\mathbf{x})+p_{\mathbf{g}}(\mathbf{x})}, \quad \therefore \text{it's true} \end{align}\]


최적화를 위한 최대/최소가 존재하는가?

\[\begin{align} {\color{red} \min_G} {\color{blue} \max_D }V({\color{blue}D},{\color{red}G}) & =\int_\mathbf{x} p_{data}(\mathbf{x}) \log {\color{blue}D}(\mathbf{x}) d\mathbf{x} + \int_\mathbf{x} p_{\mathbf{g}}(\mathbf{x}){\color{blue} [}\log(1-{\color{blue}D}(\mathbf{x})){\color{blue} ]} d\mathbf{x} ,\quad \because {\color{red}G} \text{ fixed}\\ & = \int_\mathbf{x} p_{data}(\mathbf{x}) \log \frac{p_{data}(\mathbf{x})}{p_{data}(\mathbf{x})+p_{\mathbf{g}}(\mathbf{x})} d\mathbf{x} + \int_\mathbf{x} p_{\mathbf{g}}(\mathbf{x}){\color{blue} [}\log( \frac{p_{\mathbf{g}}(\mathbf{x})}{p_{data}(\mathbf{x})+p_{\mathbf{g}}(\mathbf{x})} ){\color{blue} ]} d\mathbf{x}\\ & ={\color{average}-\log4} + \int_\mathbf{x} p_{data}(\mathbf{x}) \log \frac{p_{data}(\mathbf{x})}{\frac{p_{data}(\mathbf{x})+p_{\mathbf{g}}(\mathbf{x})}{\color{average}2}} d\mathbf{x} + \int_\mathbf{x} p_{\mathbf{g}}(\mathbf{x}){\color{blue} [}\log( \frac{p_{\mathbf{g}}(\mathbf{x})}{\frac{p_{data}(\mathbf{x})+p_{\mathbf{g}}(\mathbf{x})}{\color{average}2}} ){\color{blue} ]} d\mathbf{x} \\ & = -\log4 + KL(p_{data}(\mathbf{x})||\frac{p_{data}(\mathbf{x})+p_{\mathbf{g}}(\mathbf{x})}{2}) + KL(p_{\mathbf{g}}(\mathbf{x})||\frac{p_{data}(\mathbf{x})+p_{\mathbf{g}}(\mathbf{x})}{2})\\ &= -\log4 + 2 \times JSD(p_{data}(\mathbf{x})|| p_{\mathbf{g}}(\mathbf{x})), \quad \therefore \text{JSD is non-negative, so it's true} \end{align}\]


Reference

https://www.youtube.com/watch?v=odpjk7_tGY0

https://www.youtube.com/watch?v=0MxvAh_HMdY


DSBA