Deep Learning: Difference between revisions
Line 22: | Line 22: | ||
<math>g(z)=\frac{1}{1+e^{-z}}</math> | <math>g(z)=\frac{1}{1+e^{-z}}</math> | ||
<math>\min_{W} \left[-\sum_{i=1}^{N}\left[y_i\log(y(f_W(x_i)) + (1-y_i)\log(1-g(f_W(x_i))\right] \right]</math> | <math>\min_{W} \left[-\sum_{i=1}^{N}\left[y_i\log(y(f_W(x_i)) + (1-y_i)\log(1-g(f_W(x_i))\right] \right]</math> | ||
===Nonlinear functions=== | |||
Given an activation function \(\phi()\), \(\phi w^tx + b\) is a nonlinear function. | |||
===Models=== | |||
Multi-layer perceptron (MLP): Fully-connected feed-forward network. | |||
[[Convolutional neural network]] | |||
==Misc== | ==Misc== |
Revision as of 15:58, 1 September 2020
Notes for CMSC 828W: Foundations of Deep Learning (Fall 2020) taught by Soheil Feizi
My notes are intended to be a concise reference for myself, not a comprehensive replacement for lecture.
Basics
A refresher of Machine Learning and Supervised Learning.
Empirical risk minimization (ERM)
Minimize loss function over your data: \(\displaystyle \min_{W} \frac{1}{N} \sum_{i=1}^{N} l(f_{W}(x_i), y_i))\)
Loss functions
For regression, can use quadratic loss: \(\displaystyle l(f_W(x), y) = \frac{1}{2}\Vert f_W(x)-y \Vert^2\)
For 2-way classification, can use hinge-loss: \(\displaystyle l(f_W(x), y) = \max(0, 1-yf_W(x))\)
For multi-way classification, can use cross-entropy loss:
\(\displaystyle g(z)=\frac{1}{1+e^{-z}}\)
\(\displaystyle \min_{W} \left[-\sum_{i=1}^{N}\left[y_i\log(y(f_W(x_i)) + (1-y_i)\log(1-g(f_W(x_i))\right] \right]\)
Nonlinear functions
Given an activation function \(\phi()\), \(\phi w^tx + b\) is a nonlinear function.
Models
Multi-layer perceptron (MLP): Fully-connected feed-forward network.