Geometric Computer Vision: Difference between revisions
m David moved page Private:Geometric Computer Vision to Geometric Computer Vision over redirect |
|||
| (25 intermediate revisions by the same user not shown) | |||
| Line 191: | Line 191: | ||
[https://www.youtube.com/watch?v=DgGV3l82NTk Fundamental matrix song] | [https://www.youtube.com/watch?v=DgGV3l82NTk Fundamental matrix song] | ||
==Optical Flow | ==Structure from Motion== | ||
Optical Flow | |||
===Only Translation=== | ===Only Translation=== | ||
<math>u = \frac{-V + xW}{Z} = \frac{W}{Z}(-\frac{U}{W} + x) = \frac{W}{Z}(x - x_0)</math> | <math>u = \frac{-V + xW}{Z} = \frac{W}{Z}(-\frac{U}{W} + x) = \frac{W}{Z}(x - x_0)</math> | ||
| Line 200: | Line 202: | ||
The all eminate from the focus of expansion. | The all eminate from the focus of expansion. | ||
If you walk towards a point in the image, then all pixels will flow away from that point. | If you walk towards a point in the image, then all pixels will flow away from that point. | ||
===Only Rotation=== | |||
Rotation around x axis: | |||
<math>x = \alpha x y - \beta (1 + x^2) - \gamma y</math> | |||
Rotation around y or z axis leads to hyperbolas. | |||
The rotation is independent of depth. | |||
===Both translation and rotation=== | |||
The flow field will not resemble any of the above patterns. | |||
===The velocity of p=== | |||
===Moving plane=== | |||
For a point on a plane p and a normal vector n, the set of all points on the plane is <math>\{x | (x \cdot n) = d\}</math> where <math>d=(p \cdot n)</math> is the distance to the plane from the origin along the normal vector. | |||
===Scaling ambiguity=== | |||
Depth can be recovered up to a scale factor. | |||
===Non-Linear Least Squares Approach=== | |||
Minimize the function: | |||
<math> | |||
\sum [d^2 (p'Fp) + d^2 (pFp')] | |||
</math> | |||
===Locating the epipoles=== | |||
==3D Reconstruction== | |||
===Triangulation=== | |||
If cameras are intrinsically and extrinsically calibrated, then P is the midpoint of the common perpendicular. | |||
===Point reconstruction=== | |||
Given a point X in R3 | |||
* <math>x=MX</math> is the point in image 1 | |||
* <math>x'=M'X</math> is the point in image 2 | |||
<math> | |||
M = \begin{bmatrix} | |||
m_1^T \\ m_2^T \\ m_3^T | |||
\end{bmatrix} | |||
</math> | |||
<math>x \times MX = 0</math> | |||
<math>x \times M'X = 0</math> | |||
implies | |||
<math>AX=0</math> where <math>A = \begin{bmatrix} | |||
x m_3^T - m_1^T\\ | |||
y m_3^T - m_2^T\\ | |||
x' m_3'^T - m_1'^T\\ | |||
y' m_3'^T - m_2'^T\\ | |||
\end{bmatrix}</math> | |||
===Reconstruction for intrinsically calibrated cameras=== | |||
# Compute the essential matrix E using normalized points | |||
# Select M=[I|0] M'=[R|T] then E=[T_x]R | |||
# Find T and R using SVD of E. | |||
===Reconstruction ambiguity: projective=== | |||
<math>x_h = MX_i = (MH_p^{-1})(H_P X_i)</math> | |||
* Moving the camera will get a different reconstruction even with the same image. The 3D model will be changed by some homography. | |||
* If you know 5 points in 3D, you can rectify the 3D model. | |||
;Projective Reconstruction Theorem | |||
* We can compute a projective reconstruction of a scene from 2 views. | |||
* We don't have to know the calibration or poses. | |||
===Affine Reconstruction=== | |||
==Aperture Problem== | |||
When looking through a small viewport (locally) at large objects, you cannot tell which direction it is moving. | |||
See [https://www.opticalillusion.net/optical-illusions/the-barber-pole-illusion/ the barber pole illusion] | |||
===Brightness Constancy Equation=== | |||
===Brightness Constraint Equation=== | |||
Let <math>E(x,y,t)</math> be the irradiance and <math>u(x,y),v(x,y)</math> the components of optical flow. | |||
Then <math>E(x + u \delta t, y + v \delta t, t + \delta t) = E(x,y,t)</math>. | |||
Assume <math>E(x(y), y(t), t) = constant</math> | |||
==Structure from Motion Pipeline== | |||
===Calibration=== | |||
# Step 1: Feature Matching | |||
===Fundamental Matrix and Essential Matrix=== | |||
# Step 2: Estimate Fundamental Matrix F | |||
#* <math>x_i'^T F x_i = 0</math> | |||
#* Use SVD to solve for x from <math>Ax=0</math>: <math>A=U \Sigma V^T</math>. The solution is the last singular vector of <math>V</math>. | |||
#* Essential Matrix: <math>E = K^T F K</math> | |||
#* '''Fundamental matrix has 7 degrees of freedom, essential matrix has 5 degrees of freedom''' | |||
===Estimating Camera Pose=== | |||
Estimating Camera Pose from E | |||
Pose P has 6 DoF. Do SVD of the essential matrix to get 4 potential solutions. | |||
You need to do triangulation to select from the 4 solutions. | |||
==Visual Filters== | |||
Have filters which detect humans, cars,... | |||
==Model-based Recognition== | |||
You have a model for each object to recognize.<br> | |||
The recognition system identifies objects from the model database. | |||
===Pose Clustering=== | |||
===Indexing=== | |||
==Texture== | |||
===Synthesis=== | |||
The goal is to generate additional texture samples from an existing texture sample. | |||
===Filters=== | |||
* Difference of Gradients (DoG) | |||
* Gabor Filters | |||
==Lecture Schedule== | |||
* 02/23/2021 - Pinhole camera model | |||
* 02/25/2021 - Camera calibration | |||
* 03/09/2021 - Optical flow, motion fields | |||
* 03/11/2021 - Structure from motion: epipolar constraints, essential matrix, triangulation | |||
* 03/25/2021 - Multiple topics (image motion) | |||
* 03/30/2021 - Independent object motion (flow fields) | |||
* 04/01/2021 - Project 3 Discussion | |||
* 04/15/2021 - Shape from shading, reflectance map | |||
* 04/20/2021 - Shape from shading, normal map | |||
* 04/22/2021 - Recognition, classification | |||
* 04/27/2021 - Visual filters, classification | |||
* 04/29/2021 - Midterm Exam clarifications | |||
* 05/04/2021 - Model-based Recognition | |||
* 05/06/2021 - Texture | |||
==Projects== | ==Projects== | ||