5,337
edits
Line 2,020: | Line 2,020: | ||
** <math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math> | ** <math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math> | ||
** <math>V_{\pi} = (I - \gamma P_{\pi})^{-1}R</math> | ** <math>V_{\pi} = (I - \gamma P_{\pi})^{-1}R</math> | ||
** <math>P_{\pi}</math> is a stochastic matrix. It is not symmetric so eigenvalues can be complex. <math>\Vert \lambda_i \Vert \leq 1</math> | ** <math>P_{\pi}</math> is a stochastic matrix. It is not symmetric so eigenvalues can be complex. | ||
All eigen values are <math>\Vert \lambda_i \Vert \leq 1</math> with one eigenvalue norm exactly 1. | |||
<math>I - \gamma P_{\pi}</math> is invertible. | This implies <math>I - \gamma P_{\pi}</math> is always invertible. | ||
However, computing the inverse is computationally expensive since it scales with <math>|S|^3</math>. | |||
;Value-Iteration (Bellman Recursion) | ;Value-Iteration (Bellman Recursion) | ||
Line 2,030: | Line 2,030: | ||
<math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>. | <math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>. | ||
;Claim: <math>L_{\pi}</math> is a contraction | ;Claim: <math>L_{\pi}</math> is a contraction. | ||
Proof: Take <math>v_1, v_2 \in \mathbb{R}^d</math>. | |||
<math> | |||
\begin{aligned} | |||
L_{\pi}v_1 - L_{\pi}v_2 &= (R + \gamma P_{\pi}v_1) - (R + \gamma P_{\pi}v_2) \\ | |||
&= \gamma P_{\pi}(v_1 - v_2) | |||
\end{aligned} | |||
</math> | |||
<math> | |||
\begin{aligned} | |||
\Vert L_{\pi}v_1 - L_{\pi}v_2 \Vert &= \Vert \gamma P_{\pi}(v_1 - v_2) \Vert\\ | |||
& \leq \gamma \Vert P_{\pi} \Vert \Vert v_1 - v_2 \Vert \\ | |||
& \leq \Vert v_1 - v_2 \Vert | |||
\end{aligned} | |||
</math> | |||
By Banach's Fixed Point Theorem, we can converge to the fixed point iteratively by repeatedly applying <math>L_{\pi}</math>. | |||
==Misc== | ==Misc== |