Deep Learning: Difference between revisions
Line 2,142: | Line 2,142: | ||
\nabla_{\theta} J(\theta) &= \nabla_{\theta} E[R(\tau)] \\ | \nabla_{\theta} J(\theta) &= \nabla_{\theta} E[R(\tau)] \\ | ||
&= \nabla_{\theta} \int P_{\theta}(\tau) R(\tau) d\tau \\ | &= \nabla_{\theta} \int P_{\theta}(\tau) R(\tau) d\tau \\ | ||
&= \int \nabla_{\theta} P_{\theta}(\tau) R(\tau) d\tau | &= \int \nabla_{\theta} P_{\theta}(\tau) R(\tau) d\tau \\ | ||
&= \int P_{\theta}(\tau) \nabla_{\theta} \log P_\theta(\tau) R(\tau) d\tau\\ | |||
&= E[\log P_\theta(\tau) R(\tau)] | |||
\end{aligned} | \end{aligned} | ||
</math> | </math> |