Deep Learning: Difference between revisions

Deep Learning (view source)

528 bytes added , 8 December 2020

5,332

edits

@@ Line 2,169: / Line 2,169: @@
 [Finn & Levin, ICML]
-;Issues about policy gradient:
+===Issues with Policy Gradient===
-* High variance of gradient estimation
+;High variance of gradient estimation
 ;Solutions
@@ Line 2,184: / Line 2,184: @@
 \end{aligned}
 </math>
+;Some parameters can change <math>\pi_{\theta}</math> more than others so it's hard to choose a fixed learning rate.
+Use natural policy gradient: <math>\theta' \leftarrow \theta - \eta F^{-1}\nabla L(\theta)</math>
+===Actor-critic algorithms===
+Have an actor <math>\pi_{\theta}</math>.
+Have a critic <math>V_{\phi}/Q</math>
+<math>\nabla_{\theta} J(\theta) \approx \frac{1}{N} \sum_{i=1}^{N} \sum_{t=1}^{T} \nabla_{\theta} \log \pi_{\theta}(a_t^{(i)} | s_t^{(i)}) \left(Q(s_t^{(i)}, a_t^{(i)} - V(s_t^{(i)})\right)</math>
 ==Misc==