@@ Line 1: / Line 1: @@
-Notes for CMSC733 taught by [http://legacydirs.umiacs.umd.edu/~yiannis/ Prof. Yiannis Aloimonos]
+Notes for CMSC733 Classical and Deep Learning Approaches for Geometric Computer Vision taught by [http://legacydirs.umiacs.umd.edu/~yiannis/ Prof. Yiannis Aloimonos].
 * [http://prg.cs.umd.edu/cmsc733 Course webpage]
+==Convolution and Correlation==
+See [[Convolutional neural network]].
+Traditionally, fixed filters are used instead of learned filters.
+==Edge Detection==
+Two ways to detect edges:
+* Difference operators
+* Models
+===Image Gradients===
+* Angle is given by <math>\theta = \arctan(\frac{\partial f}{\partial y}, \frac{\partial f}{\partial x})</math>
+* Edge strength is given by <math>\left\Vert (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}) \right\Vert</math>
+Sobel operator is another way to approximate derivatives:<br>
+<math>
+s_x =
+\frac{1}{8}
+\begin{bmatrix}
+-1 & 0 & 1\\
+-2 & 0 & 2\\
+-1 & 0 & 1
+\end{bmatrix}
+</math> and
+<math>
+s_y =
+\frac{1}{8}
+\begin{bmatrix}
+& 2 & 1\\
+& 0 & 0\\
+-1 & -2 & -1
+\end{bmatrix}
+</math>
+You can smooth a function by convolving with a Gaussian kernel.
+;Laplacian of Gaussian
+* Edges are zero crossings of the Laplacian of Gaussian convolved with the signal.
+Effect of <math>\sigma</math> Gaussian kernel size:
+* Large sigma detects large scale edges.
+* Small sigma detects fine features.
+;Scale Space
+* With larger sigma, the first derivative peaks (i.e. zero crossings) can move.
+* Close-by peaks can also merge as the scale increases.
+* An edge will never split.
+===Subtraction===
+* Create a smoothed image by convolving with a Gaussian
+* Subtract the smoothed image from the original image.
+===Finding lines in an image===
+Option 1: Search for line everywhere.
+Option 2: Use Hough transform voting.
+===Hough Transform===
+Duality between lines in image space and points in Hough space.
+Equation for a line in <math>d = x \cos \theta + y \sin \theta</math>.
+<pre>
+for all pixels (x,y) on an edge:
+  for all (d, theta):
+    if d = x*cos(theta) + y*sin(theta):
+       H(d, theta) += 1
+d, theta = argmax(H)
+</pre>
+* Hough transform handles noise better than least squares.
+* Each pixel votes for a ''line'' in the Hough space. The line in the image space is the intersection of lines in the Hough space.
+;Extensions
+* Use image gradient.
+* Give more votes for stronger edges
+* Change sampling to give more/less resolution
+* Same procedure with circles, squares, or other shapes.
+;Hough transform for curves
+Works with any curve that can be written in a parametric form.
+===Finding corners===
+<math>
+C = \begin{bmatrix}
+\sum I_x^2 & \sum I_x I_y\\
+\sum I_x I_y & \sum I_y^2
+\end{bmatrix}
+</math>
+Consider <math>
+C = \begin{bmatrix}
+\lambda_1 & 0 \\
+& \lambda_2
+\end{bmatrix}
+</math>
+===Theoretical model of an eye===
+* Pick a point in space and the light rays passing through it.
+* Pinhole cameras
+** Abstractly, a box with a small hole in it.
+==Homography==
+===Cross-ratio===
+See [[Wikipedia: Cross-ratio]].
+===Solving for homographies===
+Given 4 correspondences, you can solve for a homography.
+===Point and line duality===
+Points on the image correspond to lines/rays in 3D space.
+The cross product of these correspond to a plane.
+==Calibration==
+===Central Projection===
+<math>
+\begin{bmatrix}
+u \\ v \\ w
+\end{bmatrix}
+=
+\begin{bmatrix}
+f & 0 & 0 & 0\\
+& f & 0 & 0\\
+& 0 & 1 & 0
+\end{bmatrix}
+\begin{bmatrix}
+x_s \\ y _s \\ z_s \\ 1
+\end{bmatrix}
+</math>
+===Properties of matrix P===
+<math>P = K R [I_3 | -C]</math>
+* <math>K</math> is the upper-triangular calibration matrix which has 5 degrees of freedom.
+* <math>R</math> is the rotation matrix with 3 degrees of freedom.
+* <math>C</math> is the camera center with 3 degrees of freedom.
+===Calibration===
+# Estimate matrix P using scene points and images.
+# Estimate interior parameters and exterior parameters.
+===Zhang's Approach===
+==Stereo==
+===Parallel Cameras===
+Consider two cameras, where the right camera is shifted by baseline <math>d</math> along the x-axis compared to the left camera.
+Then for a point <math>(x,y,z)</math>,
+<math>x_l = \frac{x}{z}</math>
+<math>y_l = \frac{y}{z}</math>
+<math>x_r = \frac{x-d}{z}</math>
+<math>y_r = \frac{y}{z}</math>.
+Thus, the stereo disparity is the ratio of baseline over depth: <math>x_l - x_r = \frac{d}{z}</math>.
+With known baseline and correspondence, you can solve for depth <math>z</math>.
+===Epipolar Geometry===
+# Warp the two images such that the epipolar lines become horizontal.
+# This is called rectification.
+The ''epipoles'' are where one camera sees the other.
+===Rectification===
+# Consider the left camera to be the center of a coordinate system.
+# Let <math>e_1</math> be the axis to the right camera, <math>e_2</math> to be the up axis, and take <math>e_3 = e_1 \times e_2</math>.
+===Random dot stereograms===
+Shows that recognition is not needed for stereo.
+===Similarity Construct===
+* Do matching by computing the sum of square differences (SSD) of a patch along the epipolar lines.
+* The ordering of pixels along an epipolar line may not be the same between left and right images.
+===Correspondence + Segmentation===
+* Assumption: Similar pixels in a segmentation map will probably have the same disparity.
+# For each shift, find the connected components.
+# For each point p, pick the largest connected component.
+===Essential Matrix===
+The essential matrix satisfies <math>\hat{p}' E \hat{p} = 0</math> where <math>\hat{p} = M^{-1}p</math> and <math>\hat{p}'=M'^{-1}p'</math>.
+The fundamental matrix is <math>F=M'^{-T} E M^{-1}</math>.
+;Properties
+* The matrix is 3x3.
+* If <math>F</math> is the essential matrix of (P, P') then <math>F^T</math> is the essential matrix of (P', P).
+* The essential matrix can give you the equation of the epipolar line in the second image.
+** <math>l'=Fp</math> and <math>l=F^T p'</math>
+* For any p, the epipolar line <math>l'=Fp</math> contains the epipole <math>e'</math>. This is since they come from the camera in the image.
+** <math>e'^T F = 0</math> and <math>Fe=0</math>
+[https://www.youtube.com/watch?v=DgGV3l82NTk Fundamental matrix song]
+==Structure from Motion==
+Optical Flow
+===Only Translation===
+<math>u = \frac{-V + xW}{Z} = \frac{W}{Z}(-\frac{U}{W} + x) = \frac{W}{Z}(x - x_0)</math>
+<math>v = \frac{-V + \gamma W}{Z} = \frac{W}{Z}(-\frac{V}{W} + \gamma) = \frac{W}{Z}(y - y_0)</math>
+The direction of the translation is:
+<math>\frac{v}{u} = \frac{y-y_0}{x-x_0}</math>
+The all eminate from the focus of expansion.
+If you walk towards a point in the image, then all pixels will flow away from that point.
+===Only Rotation===
+Rotation around x axis:
+<math>x = \alpha x y - \beta (1 + x^2) - \gamma y</math>
+Rotation around y or z axis leads to hyperbolas.
+The rotation is independent of depth.
+===Both translation and rotation===
+The flow field will not resemble any of the above patterns.
+===The velocity of p===
+===Moving plane===
+For a point on a plane p and a normal vector n, the set of all points on the plane is <math>\{x | (x \cdot n) = d\}</math> where <math>d=(p \cdot n)</math> is the distance to the plane from the origin along the normal vector.
+===Scaling ambiguity===
+Depth can be recovered up to a scale factor.
+===Non-Linear Least Squares Approach===
+Minimize the function:
+<math>
+\sum [d^2 (p'Fp) + d^2 (pFp')]
+</math>
+===Locating the epipoles===
+==3D Reconstruction==
+===Triangulation===
+If cameras are intrinsically and extrinsically calibrated, then P is the midpoint of the common perpendicular.
+===Point reconstruction===
+Given a point X in R3
+* <math>x=MX</math> is the point in image 1
+* <math>x'=M'X</math> is the point in image 2
+<math>
+M = \begin{bmatrix}
+m_1^T \\ m_2^T \\ m_3^T
+\end{bmatrix}
+</math>
+<math>x \times MX = 0</math>
+<math>x \times M'X = 0</math>
+implies
+<math>AX=0</math> where <math>A = \begin{bmatrix}
+x m_3^T - m_1^T\\
+y m_3^T - m_2^T\\
+x' m_3'^T - m_1'^T\\
+y' m_3'^T - m_2'^T\\
+\end{bmatrix}</math>
+===Reconstruction for intrinsically calibrated cameras===
+# Compute the essential matrix E using normalized points
+# Select M=[I|0] M'=[R|T] then E=[T_x]R
+# Find T and R using SVD of E.
+===Reconstruction ambiguity: projective===
+<math>x_h = MX_i = (MH_p^{-1})(H_P X_i)</math>
+* Moving the camera will get a different reconstruction even with the same image. The 3D model will be changed by some homography.
+* If you know 5 points in 3D, you can rectify the 3D model.
+;Projective Reconstruction Theorem
+* We can compute a projective reconstruction of a scene from 2 views.
+* We don't have to know the calibration or poses.
+===Affine Reconstruction===
+==Aperture Problem==
+When looking through a small viewport (locally) at large objects, you cannot tell which direction it is moving.
+See [https://www.opticalillusion.net/optical-illusions/the-barber-pole-illusion/ the barber pole illusion]
+===Brightness Constancy Equation===
+===Brightness Constraint Equation===
+Let <math>E(x,y,t)</math> be the irradiance and <math>u(x,y),v(x,y)</math> the components of optical flow.
+Then <math>E(x + u \delta t, y + v \delta t, t + \delta t) = E(x,y,t)</math>.
+Assume <math>E(x(y), y(t), t) = constant</math>
+==Structure from Motion Pipeline==
+===Calibration===
+# Step 1: Feature Matching
+===Fundamental Matrix and Essential Matrix===
+# Step 2: Estimate Fundamental Matrix F
+#* <math>x_i'^T F x_i = 0</math>
+#* Use SVD to solve for x from <math>Ax=0</math>: <math>A=U \Sigma V^T</math>. The solution is the last singular vector of <math>V</math>.
+#* Essential Matrix: <math>E = K^T F K</math>
+#* '''Fundamental matrix has 7 degrees of freedom, essential matrix has 5 degrees of freedom'''
+===Estimating Camera Pose===
+Estimating Camera Pose from E
+Pose P has 6 DoF. Do SVD of the essential matrix to get 4 potential solutions.
+You need to do triangulation to select from the 4 solutions.
+==Visual Filters==
+Have filters which detect humans, cars,...
+==Model-based Recognition==
+You have a model for each object to recognize.<br>
+The recognition system identifies objects from the model database.
+===Pose Clustering===
+===Indexing===
+==Texture==
+===Synthesis===
+The goal is to generate additional texture samples from an existing texture sample.
+===Filters===
+* Difference of Gradients (DoG)
+* Gabor Filters
+==Lecture Schedule==
+* 02/23/2021 - Pinhole camera model
+* 02/25/2021 - Camera calibration
+* 03/09/2021 - Optical flow, motion fields
+* 03/11/2021 - Structure from motion: epipolar constraints, essential matrix, triangulation
+* 03/25/2021 - Multiple topics (image motion)
+* 03/30/2021 - Independent object motion (flow fields)
+* 04/01/2021 - Project 3 Discussion
+* 04/15/2021 - Shape from shading, reflectance map
+* 04/20/2021 - Shape from shading, normal map
+* 04/22/2021 - Recognition, classification
+* 04/27/2021 - Visual filters, classification
+* 04/29/2021 - Midterm Exam clarifications
+* 05/04/2021 - Model-based Recognition
+* 05/06/2021 - Texture
+==Projects==

Geometric Computer Vision: Difference between revisions