# Computer Vision

Jump to navigation
Jump to search

Notes from the Udacity Computer Vision Course taught by Georgia Tech professors.

## Hough Transform

The Hough transform is a voting technique used to find things in images such as lines, circles, and arbitrary shapes.

## Image Features

### Histogram of Gradients (HOG)

See [1].

For each image, HoG generates a feature vector for overlapping 16x16 patchs of the image.

- For each 8x8 patch, compute the gradients for each pixel. Gradients will have a norm and direction.
- Then bin the gradients by direction using bilinear binning (weighted voting) such that each angle will have a sum of norm (e.g. \(\displaystyle \{0: x_0, 20: x_1, ..., 160: x_8\}\). For your 8x8 patch, \(\displaystyle (x_0, ..., x_8)\) is your feature vector or
*histogram*. This is called*orientation binning*. - For each overlapping 16x16 patch, you have 4 8x8 patches, each with a feature vector. Concatenate all to form a 36-dim feature vector. This feature vector is then normalized with L2-norm.

### SIFT

Scale Invariant Feature Transform

## Segmentation

### Mean Shift Segmentation

- For every pixel (or a sample of pixels) in the image calculate some features such as (u,v)-color or (x,y, u, v) where xy are coordinates and uv are chroma.
- For each sampled pixel, or region of interest, calculate the new
*center-of-mass*, or weighted-mean. The weights are typically Gaussian based on distance to the center. Repeat until convergence. - The regions will cluster into modes. All regions which cluster to the same position are in the same
*attraction basin*.

Attraction basin: the region for which all trajectories lead to the same mode.

- Pros

- Automatically finds basins of attraction.
- Only one parameter: Window size for region of interest.
- Does not assume any shape on cluster.

- Cons

- Need to pick a window size.
- Doesn't scale well for high dimensions.