Deep Learning: Difference between revisions

Deep Learning (view source)

Revision as of 16:17, 3 December 2020

657 bytes added , 3 December 2020

→‎Classical RL

David

Bureaucrats, Interface administrators, Administrators

5,337

edits

@@ Line 2,020: / Line 2,020: @@
 ** <math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math>
 ** <math>V_{\pi} = (I - \gamma P_{\pi})^{-1}R</math>
-** <math>P_{\pi}</math> is a stochastic matrix. It is not symmetric so eigenvalues can be complex. <math>\Vert \lambda_i \Vert \leq 1</math>
+** <math>P_{\pi}</math> is a stochastic matrix. It is not symmetric so eigenvalues can be complex.
+All eigen values are <math>\Vert \lambda_i \Vert \leq 1</math> with one eigenvalue norm exactly 1.
-<math>I - \gamma P_{\pi}</math> is invertible.
+This implies <math>I - \gamma P_{\pi}</math> is always invertible.
-This is computationally expensive since it scales with <math>|S|^3</math>
+However, computing the inverse is computationally expensive since it scales with <math>|S|^3</math>.
 ;Value-Iteration (Bellman Recursion)
@@ Line 2,030: / Line 2,030: @@
 <math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>.
-;Claim: <math>L_{\pi}</math> is a contraction
+;Claim: <math>L_{\pi}</math> is a contraction.
+Proof: Take <math>v_1, v_2 \in \mathbb{R}^d</math>.
+<math>
+\begin{aligned}
+L_{\pi}v_1 - L_{\pi}v_2 &= (R + \gamma P_{\pi}v_1) - (R + \gamma P_{\pi}v_2) \\
+&= \gamma P_{\pi}(v_1 - v_2)
+\end{aligned}
+</math>
+<math>
+\begin{aligned}
+\Vert L_{\pi}v_1 - L_{\pi}v_2 \Vert &= \Vert \gamma P_{\pi}(v_1 - v_2) \Vert\\
+& \leq \gamma \Vert P_{\pi} \Vert \Vert v_1 - v_2 \Vert \\
+& \leq \Vert v_1 - v_2 \Vert
+\end{aligned}
+</math>
+By Banach's Fixed Point Theorem, we can converge to the fixed point iteratively by repeatedly applying <math>L_{\pi}</math>.
 ==Misc==