Geometric Computer Vision: Difference between revisions
m David moved page Private:Geometric Computer Vision to Geometric Computer Vision over redirect |
|||
| (37 intermediate revisions by the same user not shown) | |||
| Line 130: | Line 130: | ||
\end{bmatrix} | \end{bmatrix} | ||
</math> | </math> | ||
===Properties of matrix P=== | |||
<math>P = K R [I_3 | -C]</math> | |||
* <math>K</math> is the upper-triangular calibration matrix which has 5 degrees of freedom. | |||
* <math>R</math> is the rotation matrix with 3 degrees of freedom. | |||
* <math>C</math> is the camera center with 3 degrees of freedom. | |||
===Calibration=== | |||
# Estimate matrix P using scene points and images. | |||
# Estimate interior parameters and exterior parameters. | |||
===Zhang's Approach=== | |||
==Stereo== | |||
===Parallel Cameras=== | |||
Consider two cameras, where the right camera is shifted by baseline <math>d</math> along the x-axis compared to the left camera. | |||
Then for a point <math>(x,y,z)</math>, | |||
<math>x_l = \frac{x}{z}</math> | |||
<math>y_l = \frac{y}{z}</math> | |||
<math>x_r = \frac{x-d}{z}</math> | |||
<math>y_r = \frac{y}{z}</math>. | |||
Thus, the stereo disparity is the ratio of baseline over depth: <math>x_l - x_r = \frac{d}{z}</math>. | |||
With known baseline and correspondence, you can solve for depth <math>z</math>. | |||
===Epipolar Geometry=== | |||
# Warp the two images such that the epipolar lines become horizontal. | |||
# This is called rectification. | |||
The ''epipoles'' are where one camera sees the other. | |||
===Rectification=== | |||
# Consider the left camera to be the center of a coordinate system. | |||
# Let <math>e_1</math> be the axis to the right camera, <math>e_2</math> to be the up axis, and take <math>e_3 = e_1 \times e_2</math>. | |||
===Random dot stereograms=== | |||
Shows that recognition is not needed for stereo. | |||
===Similarity Construct=== | |||
* Do matching by computing the sum of square differences (SSD) of a patch along the epipolar lines. | |||
* The ordering of pixels along an epipolar line may not be the same between left and right images. | |||
===Correspondence + Segmentation=== | |||
* Assumption: Similar pixels in a segmentation map will probably have the same disparity. | |||
# For each shift, find the connected components. | |||
# For each point p, pick the largest connected component. | |||
===Essential Matrix=== | |||
The essential matrix satisfies <math>\hat{p}' E \hat{p} = 0</math> where <math>\hat{p} = M^{-1}p</math> and <math>\hat{p}'=M'^{-1}p'</math>. | |||
The fundamental matrix is <math>F=M'^{-T} E M^{-1}</math>. | |||
;Properties | |||
* The matrix is 3x3. | |||
* If <math>F</math> is the essential matrix of (P, P') then <math>F^T</math> is the essential matrix of (P', P). | |||
* The essential matrix can give you the equation of the epipolar line in the second image. | |||
** <math>l'=Fp</math> and <math>l=F^T p'</math> | |||
* For any p, the epipolar line <math>l'=Fp</math> contains the epipole <math>e'</math>. This is since they come from the camera in the image. | |||
** <math>e'^T F = 0</math> and <math>Fe=0</math> | |||
[https://www.youtube.com/watch?v=DgGV3l82NTk Fundamental matrix song] | |||
==Structure from Motion== | |||
Optical Flow | |||
===Only Translation=== | |||
<math>u = \frac{-V + xW}{Z} = \frac{W}{Z}(-\frac{U}{W} + x) = \frac{W}{Z}(x - x_0)</math> | |||
<math>v = \frac{-V + \gamma W}{Z} = \frac{W}{Z}(-\frac{V}{W} + \gamma) = \frac{W}{Z}(y - y_0)</math> | |||
The direction of the translation is: | |||
<math>\frac{v}{u} = \frac{y-y_0}{x-x_0}</math> | |||
The all eminate from the focus of expansion. | |||
If you walk towards a point in the image, then all pixels will flow away from that point. | |||
===Only Rotation=== | |||
Rotation around x axis: | |||
<math>x = \alpha x y - \beta (1 + x^2) - \gamma y</math> | |||
Rotation around y or z axis leads to hyperbolas. | |||
The rotation is independent of depth. | |||
===Both translation and rotation=== | |||
The flow field will not resemble any of the above patterns. | |||
===The velocity of p=== | |||
===Moving plane=== | |||
For a point on a plane p and a normal vector n, the set of all points on the plane is <math>\{x | (x \cdot n) = d\}</math> where <math>d=(p \cdot n)</math> is the distance to the plane from the origin along the normal vector. | |||
===Scaling ambiguity=== | |||
Depth can be recovered up to a scale factor. | |||
===Non-Linear Least Squares Approach=== | |||
Minimize the function: | |||
<math> | |||
\sum [d^2 (p'Fp) + d^2 (pFp')] | |||
</math> | |||
===Locating the epipoles=== | |||
==3D Reconstruction== | |||
===Triangulation=== | |||
If cameras are intrinsically and extrinsically calibrated, then P is the midpoint of the common perpendicular. | |||
===Point reconstruction=== | |||
Given a point X in R3 | |||
* <math>x=MX</math> is the point in image 1 | |||
* <math>x'=M'X</math> is the point in image 2 | |||
<math> | |||
M = \begin{bmatrix} | |||
m_1^T \\ m_2^T \\ m_3^T | |||
\end{bmatrix} | |||
</math> | |||
<math>x \times MX = 0</math> | |||
<math>x \times M'X = 0</math> | |||
implies | |||
<math>AX=0</math> where <math>A = \begin{bmatrix} | |||
x m_3^T - m_1^T\\ | |||
y m_3^T - m_2^T\\ | |||
x' m_3'^T - m_1'^T\\ | |||
y' m_3'^T - m_2'^T\\ | |||
\end{bmatrix}</math> | |||
===Reconstruction for intrinsically calibrated cameras=== | |||
# Compute the essential matrix E using normalized points | |||
# Select M=[I|0] M'=[R|T] then E=[T_x]R | |||
# Find T and R using SVD of E. | |||
===Reconstruction ambiguity: projective=== | |||
<math>x_h = MX_i = (MH_p^{-1})(H_P X_i)</math> | |||
* Moving the camera will get a different reconstruction even with the same image. The 3D model will be changed by some homography. | |||
* If you know 5 points in 3D, you can rectify the 3D model. | |||
;Projective Reconstruction Theorem | |||
* We can compute a projective reconstruction of a scene from 2 views. | |||
* We don't have to know the calibration or poses. | |||
===Affine Reconstruction=== | |||
==Aperture Problem== | |||
When looking through a small viewport (locally) at large objects, you cannot tell which direction it is moving. | |||
See [https://www.opticalillusion.net/optical-illusions/the-barber-pole-illusion/ the barber pole illusion] | |||
===Brightness Constancy Equation=== | |||
===Brightness Constraint Equation=== | |||
Let <math>E(x,y,t)</math> be the irradiance and <math>u(x,y),v(x,y)</math> the components of optical flow. | |||
Then <math>E(x + u \delta t, y + v \delta t, t + \delta t) = E(x,y,t)</math>. | |||
Assume <math>E(x(y), y(t), t) = constant</math> | |||
==Structure from Motion Pipeline== | |||
===Calibration=== | |||
# Step 1: Feature Matching | |||
===Fundamental Matrix and Essential Matrix=== | |||
# Step 2: Estimate Fundamental Matrix F | |||
#* <math>x_i'^T F x_i = 0</math> | |||
#* Use SVD to solve for x from <math>Ax=0</math>: <math>A=U \Sigma V^T</math>. The solution is the last singular vector of <math>V</math>. | |||
#* Essential Matrix: <math>E = K^T F K</math> | |||
#* '''Fundamental matrix has 7 degrees of freedom, essential matrix has 5 degrees of freedom''' | |||
===Estimating Camera Pose=== | |||
Estimating Camera Pose from E | |||
Pose P has 6 DoF. Do SVD of the essential matrix to get 4 potential solutions. | |||
You need to do triangulation to select from the 4 solutions. | |||
==Visual Filters== | |||
Have filters which detect humans, cars,... | |||
==Model-based Recognition== | |||
You have a model for each object to recognize.<br> | |||
The recognition system identifies objects from the model database. | |||
===Pose Clustering=== | |||
===Indexing=== | |||
==Texture== | |||
===Synthesis=== | |||
The goal is to generate additional texture samples from an existing texture sample. | |||
===Filters=== | |||
* Difference of Gradients (DoG) | |||
* Gabor Filters | |||
==Lecture Schedule== | |||
* 02/23/2021 - Pinhole camera model | |||
* 02/25/2021 - Camera calibration | |||
* 03/09/2021 - Optical flow, motion fields | |||
* 03/11/2021 - Structure from motion: epipolar constraints, essential matrix, triangulation | |||
* 03/25/2021 - Multiple topics (image motion) | |||
* 03/30/2021 - Independent object motion (flow fields) | |||
* 04/01/2021 - Project 3 Discussion | |||
* 04/15/2021 - Shape from shading, reflectance map | |||
* 04/20/2021 - Shape from shading, normal map | |||
* 04/22/2021 - Recognition, classification | |||
* 04/27/2021 - Visual filters, classification | |||
* 04/29/2021 - Midterm Exam clarifications | |||
* 05/04/2021 - Model-based Recognition | |||
* 05/06/2021 - Texture | |||
==Projects== | ==Projects== | ||