Jump to content

Deep Learning: Difference between revisions

Line 2,027: Line 2,027:
;Value-Iteration (Bellman Recursion)
;Value-Iteration (Bellman Recursion)
<math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math>   
<math>V_{\pi} = R + \gamma P_{\pi} V_{\pi}</math>   
Define an operator <math>L_\pi}v = R + \gamma P_{\pi}v</math>   
Define an operator <math>L_{\pi}v = R + \gamma P_{\pi}v</math>   
<math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>.   
<math>v_pi = L_{\pi} v_{\pi}</math> so <math>v_{\pi}</math> is a fixed point to <math>L_{\pi}</math>.