Computer Vision: Difference between revisions
Created page with "Notes from the [https://www.udacity.com/course/introduction-to-computer-vision--ud810 Udacity Computer Vision Course] taught by Georgia Tech professors. ==Segmentation== ===M..." |
No edit summary |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Notes from the [https://www.udacity.com/course/introduction-to-computer-vision--ud810 Udacity Computer Vision Course] taught by Georgia Tech professors. | Notes from the [https://www.udacity.com/course/introduction-to-computer-vision--ud810 Udacity Computer Vision Course] taught by Georgia Tech professors. | ||
==Hough Transform== | |||
The Hough transform is a voting technique used to find things in images such as lines, circles, and arbitrary shapes. | |||
==Image Features== | |||
===Histogram of Gradients (HOG)=== | |||
See [https://www.youtube.com/watch?v=28xk5i1_7Zc]. | |||
For each image, HoG generates a feature vector for overlapping 16x16 patchs of the image. | |||
* For each 8x8 patch, compute the gradients for each pixel. Gradients will have a norm and direction. | |||
* Then bin the gradients by direction using bilinear binning (weighted voting) such that each angle will have a sum of norm (e.g. <math>\{0: x_0, 20: x_1, ..., 160: x_8\}</math>. For your 8x8 patch, <math>(x_0, ..., x_8)</math> is your feature vector or ''histogram''. This is called ''orientation binning''. | |||
* For each overlapping 16x16 patch, you have 4 8x8 patches, each with a feature vector. Concatenate all to form a 36-dim feature vector. This feature vector is then normalized with L2-norm. | |||
===SIFT=== | |||
{{main | SIFT features}} | |||
Scale Invariant Feature Transform | |||
==Segmentation== | ==Segmentation== | ||
===Mean Shift Segmentation=== | ===Mean Shift Segmentation=== | ||
# For every pixel (or a sample of pixels) in the image calculate some features such as (u,v)-color or (x,y, u, v) where xy are coordinates and uv are chroma. | |||
# For each sampled pixel, or region of interest, calculate the new ''center-of-mass'', or weighted-mean. The weights are typically Gaussian based on distance to the center. Repeat until convergence. | |||
# The regions will cluster into modes. All regions which cluster to the same position are in the same ''attraction basin''. | |||
Attraction basin: the region for which all trajectories lead to the same mode. | |||
;Pros | |||
* Automatically finds basins of attraction. | |||
* Only one parameter: Window size for region of interest. | |||
* Does not assume any shape on cluster. | |||
;Cons | |||
* Need to pick a window size. | |||
* Doesn't scale well for high dimensions. |
Latest revision as of 16:47, 4 December 2020
Notes from the Udacity Computer Vision Course taught by Georgia Tech professors.
Hough Transform
The Hough transform is a voting technique used to find things in images such as lines, circles, and arbitrary shapes.
Image Features
Histogram of Gradients (HOG)
See [1].
For each image, HoG generates a feature vector for overlapping 16x16 patchs of the image.
- For each 8x8 patch, compute the gradients for each pixel. Gradients will have a norm and direction.
- Then bin the gradients by direction using bilinear binning (weighted voting) such that each angle will have a sum of norm (e.g. \(\displaystyle \{0: x_0, 20: x_1, ..., 160: x_8\}\). For your 8x8 patch, \(\displaystyle (x_0, ..., x_8)\) is your feature vector or histogram. This is called orientation binning.
- For each overlapping 16x16 patch, you have 4 8x8 patches, each with a feature vector. Concatenate all to form a 36-dim feature vector. This feature vector is then normalized with L2-norm.
SIFT
Scale Invariant Feature Transform
Segmentation
Mean Shift Segmentation
- For every pixel (or a sample of pixels) in the image calculate some features such as (u,v)-color or (x,y, u, v) where xy are coordinates and uv are chroma.
- For each sampled pixel, or region of interest, calculate the new center-of-mass, or weighted-mean. The weights are typically Gaussian based on distance to the center. Repeat until convergence.
- The regions will cluster into modes. All regions which cluster to the same position are in the same attraction basin.
Attraction basin: the region for which all trajectories lead to the same mode.
- Pros
- Automatically finds basins of attraction.
- Only one parameter: Window size for region of interest.
- Does not assume any shape on cluster.
- Cons
- Need to pick a window size.
- Doesn't scale well for high dimensions.