Computer Vision: Difference between revisions

Latest revision as of 16:47, 4 December 2020

Notes from the Udacity Computer Vision Course taught by Georgia Tech professors.

The Hough transform is a voting technique used to find things in images such as lines, circles, and arbitrary shapes.

See [1].
For each image, HoG generates a feature vector for overlapping 16x16 patchs of the image.

For each 8x8 patch, compute the gradients for each pixel. Gradients will have a norm and direction.
Then bin the gradients by direction using bilinear binning (weighted voting) such that each angle will have a sum of norm (e.g. \(\displaystyle \{0: x_0, 20: x_1, ..., 160: x_8\}\). For your 8x8 patch, \(\displaystyle (x_0, ..., x_8)\) is your feature vector or histogram. This is called orientation binning.
For each overlapping 16x16 patch, you have 4 8x8 patches, each with a feature vector. Concatenate all to form a 36-dim feature vector. This feature vector is then normalized with L2-norm.

Scale Invariant Feature Transform

For every pixel (or a sample of pixels) in the image calculate some features such as (u,v)-color or (x,y, u, v) where xy are coordinates and uv are chroma.
For each sampled pixel, or region of interest, calculate the new center-of-mass, or weighted-mean. The weights are typically Gaussian based on distance to the center. Repeat until convergence.
The regions will cluster into modes. All regions which cluster to the same position are in the same attraction basin.

Attraction basin: the region for which all trajectories lead to the same mode.

@@ Line 1: / Line 1: @@
 Notes from the [https://www.udacity.com/course/introduction-to-computer-vision--ud810 Udacity Computer Vision Course] taught by Georgia Tech professors.
+==Hough Transform==
+The Hough transform is a voting technique used to find things in images such as lines, circles, and arbitrary shapes.
+==Image Features==
+===Histogram of Gradients (HOG)===
+See [https://www.youtube.com/watch?v=28xk5i1_7Zc].
+For each image, HoG generates a feature vector for overlapping 16x16 patchs of the image.
+* For each 8x8 patch, compute the gradients for each pixel. Gradients will have a norm and direction.
+* Then bin the gradients by direction using bilinear binning (weighted voting) such that each angle will have a sum of norm (e.g. <math>\{0: x_0, 20: x_1, ..., 160: x_8\}</math>. For your 8x8 patch, <math>(x_0, ..., x_8)</math> is your feature vector or ''histogram''. This is called ''orientation binning''.
+* For each overlapping 16x16 patch, you have 4 8x8 patches, each with a feature vector. Concatenate all to form a 36-dim feature vector. This feature vector is then normalized with L2-norm.
+===SIFT===
+{{main | SIFT features}}
+Scale Invariant Feature Transform
 ==Segmentation==
@@ Line 10: / Line 26: @@
 ;Pros
-* Automatically finds basin of attraction.
+* Automatically finds basins of attraction.
 * Only one parameter: Window size for region of interest.
 * Does not assume any shape on cluster.