Deep Learning: Difference between revisions

← Older edit Newer edit →

Revision as of 15:45, 1 September 2020

Notes for CMSC 828W: Foundations of Deep Learning (Fall 2020) taught by Soheil Feizi

Course Website

My notes are intended to be a concise reference for myself, not a comprehensive replacement for lecture.

Basics

A refresher of Machine Learning and Supervised Learning.

Empirical risk minimization (ERM)

Minimize loss function over your data: \(\displaystyle \min_{W} \frac{1}{N} \sum_{i=1}^{N} l(f_{W}(x_i), y_i))\)

Loss functions

For regression, can use quadratic loss: \(\displaystyle l(f_W(x), y) = \frac{1}{2}\Vert f_W(x)-y \Vert^2\)

For 2-way classification, can use hinge-loss: \(\displaystyle l(f_W(x), y) = \max(0, 1-yf_W(x))\)

For multi-way classification, can use cross-entropy loss:
\(\displaystyle g(z)=\frac{1}{1+e^{-z}}\)
\(\displaystyle -\sum_{i=1}^{N}\left[y_i\log(y(f_W(x)) + (1-y_i)\log(1-g(f_W(x))\right]\)

Misc

Visible to::users

@@ Line 16: / Line 16: @@
 <math>l(f_W(x), y) = \frac{1}{2}\Vert f_W(x)-y \Vert^2</math>
-For classification, can use hinge-loss:
+For 2-way classification, can use hinge-loss:
 <math>l(f_W(x), y) = \max(0, 1-yf_W(x))</math>
+For multi-way classification, can use cross-entropy loss:
+<math>g(z)=\frac{1}{1+e^{-z}}</math>
+<math>-\sum_{i=1}^{N}\left[y_i\log(y(f_W(x)) + (1-y_i)\log(1-g(f_W(x))\right]</math>
 ==Misc==

Revision as of 15:45, 1 September 2020

Basics

Empirical risk minimization (ERM)

Loss functions

Misc

Resources