Jump to content

Hyperparameters: Difference between revisions

(Created page with "Hyperparameters are parameters to your network which you need to tune yourself. These include learning rate, the number of nodes, kernel size, batch size, optimizer. Below a...")
 
Line 35: Line 35:
===Kernel Size===
===Kernel Size===
===Stride===
===Stride===
==Activation functions==
Historically, people used to use ReLU or Sigmoid because it is smooth and constrains the range of the output. 
However, since AlexNet, people have found that ReLU works better and is faster to compute for both forward and backward passes.
Typically, you should not include an activation or normalization at the output layer. Treat the outputs directly as logits. 
If necessary, you can add a sigmoid to constrain the range of the output.