Geometric Computer Vision: Difference between revisions
Created page with "Notes for CMSC733 taught by [http://legacydirs.umiacs.umd.edu/~yiannis/ Prof. Yiannis Aloimonos] * [http://prg.cs.umd.edu/cmsc733 Course webpage]" |
m David moved page Private:Geometric Computer Vision to Geometric Computer Vision over redirect |
||
(57 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Notes for CMSC733 taught by [http://legacydirs.umiacs.umd.edu/~yiannis/ Prof. Yiannis Aloimonos] | Notes for CMSC733 Classical and Deep Learning Approaches for Geometric Computer Vision taught by [http://legacydirs.umiacs.umd.edu/~yiannis/ Prof. Yiannis Aloimonos]. | ||
* [http://prg.cs.umd.edu/cmsc733 Course webpage] | * [http://prg.cs.umd.edu/cmsc733 Course webpage] | ||
==Convolution and Correlation== | |||
See [[Convolutional neural network]]. | |||
Traditionally, fixed filters are used instead of learned filters. | |||
==Edge Detection== | |||
Two ways to detect edges: | |||
* Difference operators | |||
* Models | |||
===Image Gradients=== | |||
* Angle is given by <math>\theta = \arctan(\frac{\partial f}{\partial y}, \frac{\partial f}{\partial x})</math> | |||
* Edge strength is given by <math>\left\Vert (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}) \right\Vert</math> | |||
Sobel operator is another way to approximate derivatives:<br> | |||
<math> | |||
s_x = | |||
\frac{1}{8} | |||
\begin{bmatrix} | |||
-1 & 0 & 1\\ | |||
-2 & 0 & 2\\ | |||
-1 & 0 & 1 | |||
\end{bmatrix} | |||
</math> and | |||
<math> | |||
s_y = | |||
\frac{1}{8} | |||
\begin{bmatrix} | |||
1 & 2 & 1\\ | |||
0 & 0 & 0\\ | |||
-1 & -2 & -1 | |||
\end{bmatrix} | |||
</math> | |||
You can smooth a function by convolving with a Gaussian kernel. | |||
;Laplacian of Gaussian | |||
* Edges are zero crossings of the Laplacian of Gaussian convolved with the signal. | |||
Effect of <math>\sigma</math> Gaussian kernel size: | |||
* Large sigma detects large scale edges. | |||
* Small sigma detects fine features. | |||
;Scale Space | |||
* With larger sigma, the first derivative peaks (i.e. zero crossings) can move. | |||
* Close-by peaks can also merge as the scale increases. | |||
* An edge will never split. | |||
===Subtraction=== | |||
* Create a smoothed image by convolving with a Gaussian | |||
* Subtract the smoothed image from the original image. | |||
===Finding lines in an image=== | |||
Option 1: Search for line everywhere. | |||
Option 2: Use Hough transform voting. | |||
===Hough Transform=== | |||
Duality between lines in image space and points in Hough space. | |||
Equation for a line in <math>d = x \cos \theta + y \sin \theta</math>. | |||
<pre> | |||
for all pixels (x,y) on an edge: | |||
for all (d, theta): | |||
if d = x*cos(theta) + y*sin(theta): | |||
H(d, theta) += 1 | |||
d, theta = argmax(H) | |||
</pre> | |||
* Hough transform handles noise better than least squares. | |||
* Each pixel votes for a ''line'' in the Hough space. The line in the image space is the intersection of lines in the Hough space. | |||
;Extensions | |||
* Use image gradient. | |||
* Give more votes for stronger edges | |||
* Change sampling to give more/less resolution | |||
* Same procedure with circles, squares, or other shapes. | |||
;Hough transform for curves | |||
Works with any curve that can be written in a parametric form. | |||
===Finding corners=== | |||
<math> | |||
C = \begin{bmatrix} | |||
\sum I_x^2 & \sum I_x I_y\\ | |||
\sum I_x I_y & \sum I_y^2 | |||
\end{bmatrix} | |||
</math> | |||
Consider <math> | |||
C = \begin{bmatrix} | |||
\lambda_1 & 0 \\ | |||
0 & \lambda_2 | |||
\end{bmatrix} | |||
</math> | |||
===Theoretical model of an eye=== | |||
* Pick a point in space and the light rays passing through it. | |||
* Pinhole cameras | |||
** Abstractly, a box with a small hole in it. | |||
==Homography== | |||
===Cross-ratio=== | |||
See [[Wikipedia: Cross-ratio]]. | |||
===Solving for homographies=== | |||
Given 4 correspondences, you can solve for a homography. | |||
===Point and line duality=== | |||
Points on the image correspond to lines/rays in 3D space. | |||
The cross product of these correspond to a plane. | |||
==Calibration== | |||
===Central Projection=== | |||
<math> | |||
\begin{bmatrix} | |||
u \\ v \\ w | |||
\end{bmatrix} | |||
= | |||
\begin{bmatrix} | |||
f & 0 & 0 & 0\\ | |||
0 & f & 0 & 0\\ | |||
0 & 0 & 1 & 0 | |||
\end{bmatrix} | |||
\begin{bmatrix} | |||
x_s \\ y _s \\ z_s \\ 1 | |||
\end{bmatrix} | |||
</math> | |||
===Properties of matrix P=== | |||
<math>P = K R [I_3 | -C]</math> | |||
* <math>K</math> is the upper-triangular calibration matrix which has 5 degrees of freedom. | |||
* <math>R</math> is the rotation matrix with 3 degrees of freedom. | |||
* <math>C</math> is the camera center with 3 degrees of freedom. | |||
===Calibration=== | |||
# Estimate matrix P using scene points and images. | |||
# Estimate interior parameters and exterior parameters. | |||
===Zhang's Approach=== | |||
==Stereo== | |||
===Parallel Cameras=== | |||
Consider two cameras, where the right camera is shifted by baseline <math>d</math> along the x-axis compared to the left camera. | |||
Then for a point <math>(x,y,z)</math>, | |||
<math>x_l = \frac{x}{z}</math> | |||
<math>y_l = \frac{y}{z}</math> | |||
<math>x_r = \frac{x-d}{z}</math> | |||
<math>y_r = \frac{y}{z}</math>. | |||
Thus, the stereo disparity is the ratio of baseline over depth: <math>x_l - x_r = \frac{d}{z}</math>. | |||
With known baseline and correspondence, you can solve for depth <math>z</math>. | |||
===Epipolar Geometry=== | |||
# Warp the two images such that the epipolar lines become horizontal. | |||
# This is called rectification. | |||
The ''epipoles'' are where one camera sees the other. | |||
===Rectification=== | |||
# Consider the left camera to be the center of a coordinate system. | |||
# Let <math>e_1</math> be the axis to the right camera, <math>e_2</math> to be the up axis, and take <math>e_3 = e_1 \times e_2</math>. | |||
===Random dot stereograms=== | |||
Shows that recognition is not needed for stereo. | |||
===Similarity Construct=== | |||
* Do matching by computing the sum of square differences (SSD) of a patch along the epipolar lines. | |||
* The ordering of pixels along an epipolar line may not be the same between left and right images. | |||
===Correspondence + Segmentation=== | |||
* Assumption: Similar pixels in a segmentation map will probably have the same disparity. | |||
# For each shift, find the connected components. | |||
# For each point p, pick the largest connected component. | |||
===Essential Matrix=== | |||
The essential matrix satisfies <math>\hat{p}' E \hat{p} = 0</math> where <math>\hat{p} = M^{-1}p</math> and <math>\hat{p}'=M'^{-1}p'</math>. | |||
The fundamental matrix is <math>F=M'^{-T} E M^{-1}</math>. | |||
;Properties | |||
* The matrix is 3x3. | |||
* If <math>F</math> is the essential matrix of (P, P') then <math>F^T</math> is the essential matrix of (P', P). | |||
* The essential matrix can give you the equation of the epipolar line in the second image. | |||
** <math>l'=Fp</math> and <math>l=F^T p'</math> | |||
* For any p, the epipolar line <math>l'=Fp</math> contains the epipole <math>e'</math>. This is since they come from the camera in the image. | |||
** <math>e'^T F = 0</math> and <math>Fe=0</math> | |||
[https://www.youtube.com/watch?v=DgGV3l82NTk Fundamental matrix song] | |||
==Structure from Motion== | |||
Optical Flow | |||
===Only Translation=== | |||
<math>u = \frac{-V + xW}{Z} = \frac{W}{Z}(-\frac{U}{W} + x) = \frac{W}{Z}(x - x_0)</math> | |||
<math>v = \frac{-V + \gamma W}{Z} = \frac{W}{Z}(-\frac{V}{W} + \gamma) = \frac{W}{Z}(y - y_0)</math> | |||
The direction of the translation is: | |||
<math>\frac{v}{u} = \frac{y-y_0}{x-x_0}</math> | |||
The all eminate from the focus of expansion. | |||
If you walk towards a point in the image, then all pixels will flow away from that point. | |||
===Only Rotation=== | |||
Rotation around x axis: | |||
<math>x = \alpha x y - \beta (1 + x^2) - \gamma y</math> | |||
Rotation around y or z axis leads to hyperbolas. | |||
The rotation is independent of depth. | |||
===Both translation and rotation=== | |||
The flow field will not resemble any of the above patterns. | |||
===The velocity of p=== | |||
===Moving plane=== | |||
For a point on a plane p and a normal vector n, the set of all points on the plane is <math>\{x | (x \cdot n) = d\}</math> where <math>d=(p \cdot n)</math> is the distance to the plane from the origin along the normal vector. | |||
===Scaling ambiguity=== | |||
Depth can be recovered up to a scale factor. | |||
===Non-Linear Least Squares Approach=== | |||
Minimize the function: | |||
<math> | |||
\sum [d^2 (p'Fp) + d^2 (pFp')] | |||
</math> | |||
===Locating the epipoles=== | |||
==3D Reconstruction== | |||
===Triangulation=== | |||
If cameras are intrinsically and extrinsically calibrated, then P is the midpoint of the common perpendicular. | |||
===Point reconstruction=== | |||
Given a point X in R3 | |||
* <math>x=MX</math> is the point in image 1 | |||
* <math>x'=M'X</math> is the point in image 2 | |||
<math> | |||
M = \begin{bmatrix} | |||
m_1^T \\ m_2^T \\ m_3^T | |||
\end{bmatrix} | |||
</math> | |||
<math>x \times MX = 0</math> | |||
<math>x \times M'X = 0</math> | |||
implies | |||
<math>AX=0</math> where <math>A = \begin{bmatrix} | |||
x m_3^T - m_1^T\\ | |||
y m_3^T - m_2^T\\ | |||
x' m_3'^T - m_1'^T\\ | |||
y' m_3'^T - m_2'^T\\ | |||
\end{bmatrix}</math> | |||
===Reconstruction for intrinsically calibrated cameras=== | |||
# Compute the essential matrix E using normalized points | |||
# Select M=[I|0] M'=[R|T] then E=[T_x]R | |||
# Find T and R using SVD of E. | |||
===Reconstruction ambiguity: projective=== | |||
<math>x_h = MX_i = (MH_p^{-1})(H_P X_i)</math> | |||
* Moving the camera will get a different reconstruction even with the same image. The 3D model will be changed by some homography. | |||
* If you know 5 points in 3D, you can rectify the 3D model. | |||
;Projective Reconstruction Theorem | |||
* We can compute a projective reconstruction of a scene from 2 views. | |||
* We don't have to know the calibration or poses. | |||
===Affine Reconstruction=== | |||
==Aperture Problem== | |||
When looking through a small viewport (locally) at large objects, you cannot tell which direction it is moving. | |||
See [https://www.opticalillusion.net/optical-illusions/the-barber-pole-illusion/ the barber pole illusion] | |||
===Brightness Constancy Equation=== | |||
===Brightness Constraint Equation=== | |||
Let <math>E(x,y,t)</math> be the irradiance and <math>u(x,y),v(x,y)</math> the components of optical flow. | |||
Then <math>E(x + u \delta t, y + v \delta t, t + \delta t) = E(x,y,t)</math>. | |||
Assume <math>E(x(y), y(t), t) = constant</math> | |||
==Structure from Motion Pipeline== | |||
===Calibration=== | |||
# Step 1: Feature Matching | |||
===Fundamental Matrix and Essential Matrix=== | |||
# Step 2: Estimate Fundamental Matrix F | |||
#* <math>x_i'^T F x_i = 0</math> | |||
#* Use SVD to solve for x from <math>Ax=0</math>: <math>A=U \Sigma V^T</math>. The solution is the last singular vector of <math>V</math>. | |||
#* Essential Matrix: <math>E = K^T F K</math> | |||
#* '''Fundamental matrix has 7 degrees of freedom, essential matrix has 5 degrees of freedom''' | |||
===Estimating Camera Pose=== | |||
Estimating Camera Pose from E | |||
Pose P has 6 DoF. Do SVD of the essential matrix to get 4 potential solutions. | |||
You need to do triangulation to select from the 4 solutions. | |||
==Visual Filters== | |||
Have filters which detect humans, cars,... | |||
==Model-based Recognition== | |||
You have a model for each object to recognize.<br> | |||
The recognition system identifies objects from the model database. | |||
===Pose Clustering=== | |||
===Indexing=== | |||
==Texture== | |||
===Synthesis=== | |||
The goal is to generate additional texture samples from an existing texture sample. | |||
===Filters=== | |||
* Difference of Gradients (DoG) | |||
* Gabor Filters | |||
==Lecture Schedule== | |||
* 02/23/2021 - Pinhole camera model | |||
* 02/25/2021 - Camera calibration | |||
* 03/09/2021 - Optical flow, motion fields | |||
* 03/11/2021 - Structure from motion: epipolar constraints, essential matrix, triangulation | |||
* 03/25/2021 - Multiple topics (image motion) | |||
* 03/30/2021 - Independent object motion (flow fields) | |||
* 04/01/2021 - Project 3 Discussion | |||
* 04/15/2021 - Shape from shading, reflectance map | |||
* 04/20/2021 - Shape from shading, normal map | |||
* 04/22/2021 - Recognition, classification | |||
* 04/27/2021 - Visual filters, classification | |||
* 04/29/2021 - Midterm Exam clarifications | |||
* 05/04/2021 - Model-based Recognition | |||
* 05/06/2021 - Texture | |||
==Projects== |