Hyperparameters: Difference between revisions

Hyperparameters (view source)

467 bytes added , 5 February 2021

5,321

edits

@@ Line 35: / Line 35: @@
 ===Kernel Size===
 ===Stride===
+==Activation functions==
+Historically, people used to use ReLU or Sigmoid because it is smooth and constrains the range of the output.
+However, since AlexNet, people have found that ReLU works better and is faster to compute for both forward and backward passes.
+Typically, you should not include an activation or normalization at the output layer. Treat the outputs directly as logits.
+If necessary, you can add a sigmoid to constrain the range of the output.