5,332
edits
Line 2,027: | Line 2,027: | ||
;Value-Iteration (Bellman Recursion) | ;Value-Iteration (Bellman Recursion) | ||
<math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math> | <math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math> | ||
Define an operator <math>L_\pi}v = R + \gamma P_{\pi}v</math> | Define an operator <math>L_{\pi}v = R + \gamma P_{\pi}v</math> | ||
<math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>. | <math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>. | ||