Computer Vision

Notes from the Udacity Computer Vision Course taught by Georgia Tech professors.

Hough Transform

The Hough transform is a voting technique used to find things in images such as lines, circles, and arbitrary shapes.

See [1].
For each image, HoG generates a feature vector for overlapping 16x16 patchs of the image.

For each 8x8 patch, compute the gradients for each pixel. Gradients will have a norm and direction.
Then bin the gradients by direction using bilinear binning (weighted voting) such that each angle will have a sum of norm (e.g. \(\displaystyle \{0: x_0, 20: x_1, ..., 160: x_8\}\). For your 8x8 patch, \(\displaystyle (x_0, ..., x_8)\) is your feature vector or histogram. This is called orientation binning.
For each overlapping 16x16 patch, you have 4 8x8 patches, each with a feature vector. Concatenate all to form a 36-dim feature vector. This feature vector is then normalized with L2-norm.

Scale Invariant Feature Transform

For every pixel (or a sample of pixels) in the image calculate some features such as (u,v)-color or (x,y, u, v) where xy are coordinates and uv are chroma.
For each sampled pixel, or region of interest, calculate the new center-of-mass, or weighted-mean. The weights are typically Gaussian based on distance to the center. Repeat until convergence.
The regions will cluster into modes. All regions which cluster to the same position are in the same attraction basin.

Attraction basin: the region for which all trajectories lead to the same mode.