Deep Learning: Difference between revisions
Line 1,458: | Line 1,458: | ||
Fitting a ''density'' function to discrete data can have crazy peaks. | Fitting a ''density'' function to discrete data can have crazy peaks. | ||
For dequantization, add a uniform to the data to get a more stable density function. | For dequantization, add a uniform to the data to get a more stable density function. | ||
<math> | |||
\begin{aligned} | |||
\log \int p(x_i +\delta) p(\delta)ds &= \log E_\delta [\log p(x_i + \delta)]\\ | |||
&\geq E_{\delta}[\log p(x_i + \delta)]\\ | |||
&\approx \log p(x_i + \delta) | |||
\end{aligned} | |||
</math> | |||
* We have exact likelihood estimations. | |||
** We can use out-of-distribution anomaly detection. | |||
However in practice after training on CIFAR, the liklihood of MNIST is higher. | |||
This behavior is not specific to flow-based models. | |||
Suppose <math>P_{\theta}</math> is <math>N(0, I_d)</math>. | |||
Our typical example has <math>\Vert x_i \Vert^2 = O(\d)</math>. | |||
Consider <math>x^{test} = 0</math> then <math>P_{\theta}(x^{test}) > P_{\theta}(x_1)</math>. | |||
==Misc== | ==Misc== |