Deep Learning: Difference between revisions

← Older edit Newer edit →

@@ Line 1,878: / Line 1,878: @@
 ===Recurrent Neural Networks (RNNs)===
 Hidden state: <math>h_t = \tanh(W_{hh} h_{t-1} + W_{xh} x_t)</math>
-Prediction at t: <math>y_t = W h_{t}</math>
+Prediction at t: <math>y_t = W_{hy} h_{t}</math>
+;Backpropagation through time
+If <math>W</math> has largest singular value < 1, then gradient vanishes.
+If <math>W</math> has largest singular value > 1, then gradient explodes.
+Typically, gradient vanishes because of initialization of <math>W</math>.
+===Long Short Term Memory (LSTMs)===
+LSTM has several gates:
+* Input gate: <math>i_t = \sigma(W_{xi}x_t + W_{hi}h_{t-1} + b_i)</math>
+* Forget gate: <math>f_t = \sigma(W_{xf}x_t + W_{hf}h_{t-1} + b_f)</math>
+* Output Gate: <math>o_t = \sigma(W_{xo}x_t + W_{ho}h_{t-1} + b_o)</math>
+* Cell state: <math>c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t</math>
+* Hidden state: <math>h_t = o_t \odot tanh(c_t)</math>
 ==Misc==