Jump to content

Deep Learning: Difference between revisions

Line 2,020: Line 2,020:
** <math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math>
** <math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math>
** <math>V_{\pi} = (I - \gamma P_{\pi})^{-1}R</math>
** <math>V_{\pi} = (I - \gamma P_{\pi})^{-1}R</math>
** <math>P_{\pi}</math> is a stochastic matrix. It is not symmetric so eigenvalues can be complex. <math>\Vert \lambda_i \Vert \leq 1</math>
** <math>P_{\pi}</math> is a stochastic matrix. It is not symmetric so eigenvalues can be complex.
 
All eigen values are <math>\Vert \lambda_i \Vert \leq 1</math> with one eigenvalue norm exactly 1. 
<math>I - \gamma P_{\pi}</math> is invertible.   
This implies <math>I - \gamma P_{\pi}</math> is always invertible.   
This is computationally expensive since it scales with <math>|S|^3</math>
However, computing the inverse is computationally expensive since it scales with <math>|S|^3</math>.


;Value-Iteration (Bellman Recursion)
;Value-Iteration (Bellman Recursion)
Line 2,030: Line 2,030:
<math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>.   
<math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>.   


;Claim: <math>L_{\pi}</math> is a contraction
;Claim: <math>L_{\pi}</math> is a contraction.
Proof: Take <math>v_1, v_2 \in \mathbb{R}^d</math>. 
<math>
\begin{aligned}
L_{\pi}v_1 - L_{\pi}v_2 &= (R + \gamma P_{\pi}v_1) - (R + \gamma P_{\pi}v_2) \\
&= \gamma P_{\pi}(v_1 - v_2)
\end{aligned}
</math> 
<math>
\begin{aligned}
\Vert L_{\pi}v_1 - L_{\pi}v_2 \Vert &= \Vert \gamma P_{\pi}(v_1 - v_2) \Vert\\
& \leq \gamma \Vert P_{\pi} \Vert \Vert v_1 - v_2 \Vert \\
& \leq \Vert v_1 - v_2 \Vert
\end{aligned}
</math> 
By Banach's Fixed Point Theorem, we can converge to the fixed point iteratively by repeatedly applying <math>L_{\pi}</math>.


==Misc==
==Misc==