Jump to content

Deep Learning: Difference between revisions

Line 2,169: Line 2,169:
[Finn & Levin, ICML]
[Finn & Levin, ICML]


;Issues about policy gradient:
===Issues with Policy Gradient===
* High variance of gradient estimation
;High variance of gradient estimation


;Solutions
;Solutions
Line 2,184: Line 2,184:
\end{aligned}
\end{aligned}
</math>
</math>
;Some parameters can change <math>\pi_{\theta}</math> more than others so it's hard to choose a fixed learning rate.
Use natural policy gradient: <math>\theta' \leftarrow \theta - \eta F^{-1}\nabla L(\theta)</math>
===Actor-critic algorithms===
Have an actor <math>\pi_{\theta}</math>. 
Have a critic <math>V_{\phi}/Q</math>
<math>\nabla_{\theta} J(\theta) \approx \frac{1}{N} \sum_{i=1}^{N} \sum_{t=1}^{T} \nabla_{\theta} \log \pi_{\theta}(a_t^{(i)} | s_t^{(i)}) \left(Q(s_t^{(i)}, a_t^{(i)} - V(s_t^{(i)})\right)</math>


==Misc==
==Misc==