Distinctive Image Features from Scale-Invariant Keypoints: Difference between revisions
Created page with "Distinctive Image Features from Scale-Invariant Keypoints Also known as Scale-invariant feature transform (SIFT) or SIFT features. Author: David G. Lowe Affiliation: Univer..." |
|||
Line 9: | Line 9: | ||
==Algorithm== | ==Algorithm== | ||
===Scale-space extrema detection=== | ===Scale-space extrema detection=== | ||
\(L(x,y,\sigma)\) is the scale space of an image. This is generated by the convolution of a Gaussian with an input image: | |||
<math display="block"> | |||
\begin{align} | |||
L(x,y,\sigma) = G(x, y, \sigma) * I(x,y) | |||
\end{align} | |||
</math> | |||
where <math display="inline">G(x,y,\sigma) = \frac{1}{2 \pi \sigma^2}\exp{-(x^2+y^2)/(2\sigma^2)}</math>. | |||
This is used to compute the difference-of-Gaussian (DoG) for two scales separated by \(k\): | |||
<math display="block"> | |||
\begin{align} | |||
D(x,y,\sigma) &= (G(x,y,k\sigma) - G(x,y,\sigma)) * I(x,y)\nonumber \\ | |||
&= L(x,y,k\sigma) - L(x,y,\sigma) | |||
\end{align} | |||
</math> | |||
For your image, you create multiple octaves by down-sampling the image by 2. | |||
Within each octave, you have multiple scales with varying \(\sigma\). | |||
E.g. if you have 5 scales per octave, then you will have 4 DoG images. | |||
To find local min/max, compare each pixel in each DoG image to its 8 neighboring pixels within the image and 9 neighboring pixels in each neighboring scale. In total, it will be the min/max among (8+9+9=26) neighboring pixels. This is only done on the intermediate DoG images (i.e. excluding the first and last). | |||
In his paper, Lowe uses \(k=\sqrt{2}=2^{1/s}\) with \(s=2\). This means generating \(s+3=5\) Gaussian blurred images for each octave. | |||
===Keypoint localization=== | ===Keypoint localization=== | ||
===Orientation assignment=== | ===Orientation assignment=== | ||
===Keypoint descriptor=== | ===Keypoint descriptor=== |
Revision as of 14:47, 2 September 2020
Distinctive Image Features from Scale-Invariant Keypoints Also known as Scale-invariant feature transform (SIFT) or SIFT features.
Author: David G. Lowe
Affiliation: University of British Columbia
Algorithm
Scale-space extrema detection
\(L(x,y,\sigma)\) is the scale space of an image. This is generated by the convolution of a Gaussian with an input image: \[ \begin{align} L(x,y,\sigma) = G(x, y, \sigma) * I(x,y) \end{align} \] where \(G(x,y,\sigma) = \frac{1}{2 \pi \sigma^2}\exp{-(x^2+y^2)/(2\sigma^2)}\).
This is used to compute the difference-of-Gaussian (DoG) for two scales separated by \(k\): \[ \begin{align} D(x,y,\sigma) &= (G(x,y,k\sigma) - G(x,y,\sigma)) * I(x,y)\nonumber \\ &= L(x,y,k\sigma) - L(x,y,\sigma) \end{align} \]
For your image, you create multiple octaves by down-sampling the image by 2.
Within each octave, you have multiple scales with varying \(\sigma\).
E.g. if you have 5 scales per octave, then you will have 4 DoG images.
To find local min/max, compare each pixel in each DoG image to its 8 neighboring pixels within the image and 9 neighboring pixels in each neighboring scale. In total, it will be the min/max among (8+9+9=26) neighboring pixels. This is only done on the intermediate DoG images (i.e. excluding the first and last).
In his paper, Lowe uses \(k=\sqrt{2}=2^{1/s}\) with \(s=2\). This means generating \(s+3=5\) Gaussian blurred images for each octave.