Computer Vision: Difference between revisions

Created page with "Notes from the [https://www.udacity.com/course/introduction-to-computer-vision--ud810 Udacity Computer Vision Course] taught by Georgia Tech professors. ==Segmentation== ===M..."
 
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
Notes from the [https://www.udacity.com/course/introduction-to-computer-vision--ud810 Udacity Computer Vision Course] taught by Georgia Tech professors.
Notes from the [https://www.udacity.com/course/introduction-to-computer-vision--ud810 Udacity Computer Vision Course] taught by Georgia Tech professors.
==Hough Transform==
The Hough transform is a voting technique used to find things in images such as lines, circles, and arbitrary shapes.
==Image Features==
===Histogram of Gradients (HOG)===
See [https://www.youtube.com/watch?v=28xk5i1_7Zc]. 
For each image, HoG generates a feature vector for overlapping 16x16 patchs of the image. 
* For each 8x8 patch, compute the gradients for each pixel. Gradients will have a norm and direction.
* Then bin the gradients by direction using bilinear binning (weighted voting) such that each angle will have a sum of norm (e.g. <math>\{0: x_0, 20: x_1, ..., 160: x_8\}</math>. For your 8x8 patch, <math>(x_0, ..., x_8)</math> is your feature vector or ''histogram''. This is called ''orientation binning''.
* For each overlapping 16x16 patch, you have 4 8x8 patches, each with a feature vector. Concatenate all to form a 36-dim feature vector. This feature vector is then normalized with L2-norm.
===SIFT===
{{main | SIFT features}}
Scale Invariant Feature Transform


==Segmentation==
==Segmentation==
===Mean Shift Segmentation===
===Mean Shift Segmentation===
# For every pixel (or a sample of pixels) in the image calculate some features such as (u,v)-color or (x,y, u, v) where xy are coordinates and uv are chroma.
# For each sampled pixel, or region of interest, calculate the new ''center-of-mass'', or weighted-mean. The weights are typically Gaussian based on distance to the center. Repeat until convergence.
# The regions will cluster into modes. All regions which cluster to the same position are in the same ''attraction basin''.
Attraction basin: the region for which all trajectories lead to the same mode.
;Pros
* Automatically finds basins of attraction.
* Only one parameter: Window size for region of interest.
* Does not assume any shape on cluster.
;Cons
* Need to pick a window size.
* Doesn't scale well for high dimensions.