Deep Learning: Difference between revisions

Line 1,878: Line 1,878:
===Recurrent Neural Networks (RNNs)===
===Recurrent Neural Networks (RNNs)===
Hidden state: <math>h_t = \tanh(W_{hh} h_{t-1} + W_{xh} x_t)</math>   
Hidden state: <math>h_t = \tanh(W_{hh} h_{t-1} + W_{xh} x_t)</math>   
Prediction at t: <math>y_t = W h_{t}</math>
Prediction at t: <math>y_t = W_{hy} h_{t}</math>
 
;Backpropagation through time
If <math>W</math> has largest singular value < 1, then gradient vanishes. 
If <math>W</math> has largest singular value > 1, then gradient explodes. 
Typically, gradient vanishes because of initialization of <math>W</math>.
 
===Long Short Term Memory (LSTMs)===
LSTM has several gates:
* Input gate: <math>i_t = \sigma(W_{xi}x_t + W_{hi}h_{t-1} + b_i)</math>
* Forget gate: <math>f_t = \sigma(W_{xf}x_t + W_{hf}h_{t-1} + b_f)</math>
* Output Gate: <math>o_t = \sigma(W_{xo}x_t + W_{ho}h_{t-1} + b_o)</math>
* Cell state: <math>c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t</math>
* Hidden state: <math>h_t = o_t \odot tanh(c_t)</math>


==Misc==
==Misc==