Sneha Bangalore

LESSWRONG
LW

Sneha Bangalore — LessWrong

3mo

Method

Policy gradient methods^[1] have two components to them - the probability $π_{θ} (a_{t} | s_{t})$ and the reward $R_{t}$ . The goal is to maximize the reward $R_{t}$ and increasing the probability of the action that leads to it, from the state. Different algorithms have tweaked the reward weight in the objective to try and reduce its variance. For eg: REINFORCE subtracts a baseline (moving average reward) from the reward. PPO uses advantage which subtracts a value function from the reward. GRPO also uses advantage by approximating the value function with Monte Carlo groups.

Most problems that use Reinforcement Learning design rewards such that a good result receives a positive reward while a bad result leads to a negative reward. This doesn't seem... (read 4300 more words →)

Diffusion Primer

Sneha Bangalore

6mo

Diffusion, a class of generative models, boils down to an MSE between added noise and network predicted noise.
Why would learning to predict the noise help with image generation (which is what it is most used for)? How did we arrive at MSE? This post dives deep into the math to answer these questions.

Background

One way to interpret Diffusion is as a continuous VAE (Variational Auto Encoder).
A VAE computes a lower bound on the likelihood of generating real data samples ( $log p_{θ} (x)$ ) by approximating the unknown posterior $p_{θ} (z | x)$ with a learnable one $q_{ϕ} (z | x)$ [Fig. 1]:

$\begin{matrix} - log p_{θ} (x) & = - log p_{θ} (x) \leq - log p_{θ} (x) + D_{K L} (q_{ϕ} (z | x) ∥ p_{θ} (z | x)) & [KL is always positive; aka -ELBO] \leq - log p_{θ} (x) + \int q_{ϕ} (z | x) log (\frac{q_{ϕ} (z | x)}{p_{θ} (z | x)}) d z & [Definition of KL] \leq - log p_{θ} (x) + \int q_{ϕ} (z | x) log (\frac{q_{ϕ} (z | x) p_{θ} (x)}{p_{θ} (z, x)}) d z & [conditional to joint] \leq - log p_{θ} (x) + \int q_{ϕ} (z | x) (log p_{θ} (x) + log (\frac{q_{ϕ} (z | x)}{p_{θ} (z, x)})) d z \leq - log p_{θ} (x) + log p_{θ} (x) + \int q_{ϕ} (z | x) (log (\frac{q_{ϕ} (z | x)}{p_{θ} (x | z) p_{θ} (z)})) d z & [p_{θ} independent of z; joint to conditional] = - E_{z_{\sim} q_{ϕ} (z | x)} [log (\frac{q_{ϕ} (z | x)}{p_{θ} (z)}) - log p_{θ} (x | z)] & [Definition of E for continuous variable z] = - E_{z_{\sim} q_{ϕ} (z | x)} log p_{θ} (x | z) + D_{K L} (q_{ϕ} (z | x) ∥ p_{θ} (z)) & [Tractable] \end{matrix}$

The process of encoding ( $q_{ϕ} (z | x)$ ) and decoding... (read 2044 more words →)

LESSWRONG
LW

LESSWRONG
LW

Sneha Bangalore

Don't cancel out your rewards!

Diffusion Primer

Sneha Bangalore

Sneha Bangalore

Don't cancel out your rewards!

Diffusion Primer

Method

Background