Deep Learning: Difference between revisions

Line 2,142: Line 2,142:
\nabla_{\theta} J(\theta) &= \nabla_{\theta} E[R(\tau)] \\
\nabla_{\theta} J(\theta) &= \nabla_{\theta} E[R(\tau)] \\
&= \nabla_{\theta} \int P_{\theta}(\tau) R(\tau) d\tau \\
&= \nabla_{\theta} \int P_{\theta}(\tau) R(\tau) d\tau \\
&= \int \nabla_{\theta} P_{\theta}(\tau) R(\tau) d\tau
&= \int \nabla_{\theta} P_{\theta}(\tau) R(\tau) d\tau \\
&= \int P_{\theta}(\tau) \nabla_{\theta} \log P_\theta(\tau) R(\tau) d\tau\\
&= E[\log P_\theta(\tau) R(\tau)]
\end{aligned}
\end{aligned}
</math>
</math>