Geometric Computer Vision: Difference between revisions

 
(37 intermediate revisions by the same user not shown)
Line 130: Line 130:
\end{bmatrix}
\end{bmatrix}
</math>
</math>
===Properties of matrix P===
<math>P = K R [I_3 | -C]</math>
* <math>K</math> is the upper-triangular calibration matrix which has 5 degrees of freedom.
* <math>R</math> is the rotation matrix with 3 degrees of freedom.
* <math>C</math> is the camera center with 3 degrees of freedom.
===Calibration===
# Estimate matrix P using scene points and images.
# Estimate interior parameters and exterior parameters.
===Zhang's Approach===
==Stereo==
===Parallel Cameras===
Consider two cameras, where the right camera is shifted by baseline <math>d</math> along the x-axis compared to the left camera. 
Then for a point <math>(x,y,z)</math>,
<math>x_l = \frac{x}{z}</math> 
<math>y_l = \frac{y}{z}</math> 
<math>x_r = \frac{x-d}{z}</math> 
<math>y_r = \frac{y}{z}</math>. 
Thus, the stereo disparity is the ratio of baseline over depth: <math>x_l - x_r = \frac{d}{z}</math>. 
With known baseline and correspondence, you can solve for depth <math>z</math>.
===Epipolar Geometry===
# Warp the two images such that the epipolar lines become horizontal.
# This is called rectification.
The ''epipoles'' are where one camera sees the other.
===Rectification===
# Consider the left camera to be the center of a coordinate system.
# Let <math>e_1</math> be the axis to the right camera, <math>e_2</math> to be the up axis, and take <math>e_3 = e_1 \times e_2</math>.
===Random dot stereograms===
Shows that recognition is not needed for stereo.
===Similarity Construct===
* Do matching by computing the sum of square differences (SSD) of a patch along the epipolar lines.
* The ordering of pixels along an epipolar line may not be the same between left and right images.
===Correspondence + Segmentation===
* Assumption: Similar pixels in a segmentation map will probably have the same disparity.
# For each shift, find the connected components.
# For each point p, pick the largest connected component.
===Essential Matrix===
The essential matrix satisfies <math>\hat{p}' E \hat{p} = 0</math> where <math>\hat{p} = M^{-1}p</math> and <math>\hat{p}'=M'^{-1}p'</math>.
The fundamental matrix is <math>F=M'^{-T} E M^{-1}</math>.
;Properties
* The matrix is 3x3.
* If <math>F</math> is the essential matrix of (P, P') then <math>F^T</math> is the essential matrix of (P', P).
* The essential matrix can give you the equation of the epipolar line in the second image.
** <math>l'=Fp</math> and <math>l=F^T p'</math>
* For any p, the epipolar line <math>l'=Fp</math> contains the epipole <math>e'</math>. This is since they come from the camera in the image.
** <math>e'^T F = 0</math> and <math>Fe=0</math>
[https://www.youtube.com/watch?v=DgGV3l82NTk Fundamental matrix song]
==Structure from Motion==
Optical Flow
===Only Translation===
<math>u = \frac{-V + xW}{Z} = \frac{W}{Z}(-\frac{U}{W} + x) = \frac{W}{Z}(x - x_0)</math> 
<math>v = \frac{-V + \gamma W}{Z} = \frac{W}{Z}(-\frac{V}{W} + \gamma) = \frac{W}{Z}(y - y_0)</math>
The direction of the translation is: 
<math>\frac{v}{u} = \frac{y-y_0}{x-x_0}</math> 
The all eminate from the focus of expansion. 
If you walk towards a point in the image, then all pixels will flow away from that point.
===Only Rotation===
Rotation around x axis:
<math>x = \alpha x y - \beta (1 + x^2) - \gamma y</math>
Rotation around y or z axis leads to hyperbolas.
The rotation is independent of depth.
===Both translation and rotation===
The flow field will not resemble any of the above patterns.
===The velocity of p===
===Moving plane===
For a point on a plane p and a normal vector n, the set of all points on the plane is <math>\{x | (x \cdot n) = d\}</math> where <math>d=(p \cdot n)</math> is the distance to the plane from the origin along the normal vector.
===Scaling ambiguity===
Depth can be recovered up to a scale factor.
===Non-Linear Least Squares Approach===
Minimize the function:
<math>
\sum [d^2 (p'Fp) + d^2 (pFp')]
</math>
===Locating the epipoles===
==3D Reconstruction==
===Triangulation===
If cameras are intrinsically and extrinsically calibrated, then P is the midpoint of the common perpendicular.
===Point reconstruction===
Given a point X in R3
* <math>x=MX</math> is the point in image 1
* <math>x'=M'X</math> is the point in image 2
<math>
M = \begin{bmatrix}
m_1^T \\ m_2^T \\ m_3^T
\end{bmatrix}
</math>
<math>x \times MX = 0</math> 
<math>x \times M'X = 0</math> 
implies 
<math>AX=0</math> where <math>A = \begin{bmatrix}
x m_3^T - m_1^T\\
y m_3^T - m_2^T\\
x' m_3'^T - m_1'^T\\
y' m_3'^T - m_2'^T\\
\end{bmatrix}</math>
===Reconstruction for intrinsically calibrated cameras===
# Compute the essential matrix E using normalized points
# Select M=[I|0] M'=[R|T] then E=[T_x]R
# Find T and R using SVD of E.
===Reconstruction ambiguity: projective===
<math>x_h = MX_i = (MH_p^{-1})(H_P X_i)</math>
* Moving the camera will get a different reconstruction even with the same image. The 3D model will be changed by some homography.
* If you know 5 points in 3D, you can rectify the 3D model.
;Projective Reconstruction Theorem
* We can compute a projective reconstruction of a scene from 2 views.
* We don't have to know the calibration or poses.
===Affine Reconstruction===
==Aperture Problem==
When looking through a small viewport (locally) at large objects, you cannot tell which direction it is moving. 
See [https://www.opticalillusion.net/optical-illusions/the-barber-pole-illusion/ the barber pole illusion]
===Brightness Constancy Equation===
===Brightness Constraint Equation===
Let <math>E(x,y,t)</math> be the irradiance and <math>u(x,y),v(x,y)</math> the components of optical flow. 
Then <math>E(x + u \delta t, y + v \delta t, t + \delta t) = E(x,y,t)</math>.
Assume <math>E(x(y), y(t), t) = constant</math>
==Structure from Motion Pipeline==
===Calibration===
# Step 1: Feature Matching
===Fundamental Matrix and Essential Matrix===
# Step 2: Estimate Fundamental Matrix F
#* <math>x_i'^T F x_i = 0</math>
#* Use SVD to solve for x from <math>Ax=0</math>: <math>A=U \Sigma V^T</math>. The solution is the last singular vector of <math>V</math>.
#* Essential Matrix: <math>E = K^T F K</math>
#* '''Fundamental matrix has 7 degrees of freedom, essential matrix has 5 degrees of freedom'''
===Estimating Camera Pose===
Estimating Camera Pose from E 
Pose P has 6 DoF. Do SVD of the essential matrix to get 4 potential solutions. 
You need to do triangulation to select from the 4 solutions.
==Visual Filters==
Have filters which detect humans, cars,...
==Model-based Recognition==
You have a model for each object to recognize.<br>
The recognition system identifies objects from the model database.
===Pose Clustering===
===Indexing===
==Texture==
===Synthesis===
The goal is to generate additional texture samples from an existing texture sample.
===Filters===
* Difference of Gradients (DoG)
* Gabor Filters
==Lecture Schedule==
* 02/23/2021 - Pinhole camera model
* 02/25/2021 - Camera calibration
* 03/09/2021 - Optical flow, motion fields
* 03/11/2021 - Structure from motion: epipolar constraints, essential matrix, triangulation
* 03/25/2021 - Multiple topics (image motion)
* 03/30/2021 - Independent object motion (flow fields)
* 04/01/2021 - Project 3 Discussion
* 04/15/2021 - Shape from shading, reflectance map
* 04/20/2021 - Shape from shading, normal map
* 04/22/2021 - Recognition, classification
* 04/27/2021 - Visual filters, classification
* 04/29/2021 - Midterm Exam clarifications
* 05/04/2021 - Model-based Recognition
* 05/06/2021 - Texture


==Projects==
==Projects==