# Geometric Computer Vision

Notes for CMSC733 Classical and Deep Learning Approaches for Geometric Computer Vision taught by Prof. Yiannis Aloimonos.

## Convolution and Correlation

See Convolutional neural network.

Traditionally, fixed filters are used instead of learned filters.

## Edge Detection

Two ways to detect edges:

- Difference operators
- Models

### Image Gradients

- Angle is given by \(\displaystyle \theta = \arctan(\frac{\partial f}{\partial y}, \frac{\partial f}{\partial x})\)
- Edge strength is given by \(\displaystyle \left\Vert (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}) \right\Vert\)

Sobel operator is another way to approximate derivatives:

\(\displaystyle
s_x =
\frac{1}{8}
\begin{bmatrix}
-1 & 0 & 1\\
-2 & 0 & 2\\
-1 & 0 & 1
\end{bmatrix}
\) and
\(\displaystyle
s_y =
\frac{1}{8}
\begin{bmatrix}
1 & 2 & 1\\
0 & 0 & 0\\
-1 & -2 & -1
\end{bmatrix}
\)

You can smooth a function by convolving with a Gaussian kernel.

- Laplacian of Gaussian

- Edges are zero crossings of the Laplacian of Gaussian convolved with the signal.

Effect of \(\displaystyle \sigma\) Gaussian kernel size:

- Large sigma detects large scale edges.
- Small sigma detects fine features.

- Scale Space

- With larger sigma, the first derivative peaks (i.e. zero crossings) can move.
- Close-by peaks can also merge as the scale increases.
- An edge will never split.

### Subtraction

- Create a smoothed image by convolving with a Gaussian
- Subtract the smoothed image from the original image.

### Finding lines in an image

Option 1: Search for line everywhere.

Option 2: Use Hough transform voting.

### Hough Transform

Duality between lines in image space and points in Hough space.

Equation for a line in \(\displaystyle d = x \cos \theta + y \sin \theta\).

for all pixels (x,y) on an edge: for all (d, theta): if d = x*cos(theta) + y*sin(theta): H(d, theta) += 1 d, theta = argmax(H)

- Hough transform handles noise better than least squares.
- Each pixel votes for a
*line*in the Hough space. The line in the image space is the intersection of lines in the Hough space.

- Extensions

- Use image gradient.
- Give more votes for stronger edges
- Change sampling to give more/less resolution
- Same procedure with circles, squares, or other shapes.

- Hough transform for curves

Works with any curve that can be written in a parametric form.

### Finding corners

\(\displaystyle C = \begin{bmatrix} \sum I_x^2 & \sum I_x I_y\\ \sum I_x I_y & \sum I_y^2 \end{bmatrix} \)

Consider \(\displaystyle C = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} \)

### Theoretical model of an eye

- Pick a point in space and the light rays passing through it.
- Pinhole cameras
- Abstractly, a box with a small hole in it.

## Homography

### Cross-ratio

### Solving for homographies

Given 4 correspondences, you can solve for a homography.

### Point and line duality

Points on the image correspond to lines/rays in 3D space.

The cross product of these correspond to a plane.

## Calibration

### Central Projection

\(\displaystyle \begin{bmatrix} u \\ v \\ w \end{bmatrix} = \begin{bmatrix} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x_s \\ y _s \\ z_s \\ 1 \end{bmatrix} \)

### Properties of matrix P

\(\displaystyle P = K R [I_3 | -C]\)

- \(\displaystyle K\) is the upper-triangular calibration matrix which has 5 degrees of freedom.
- \(\displaystyle R\) is the rotation matrix with 3 degrees of freedom.
- \(\displaystyle C\) is the camera center with 3 degrees of freedom.

### Calibration

- Estimate matrix P using scene points and images.
- Estimate interior parameters and exterior parameters.

### Zhang's Approach

## Stereo

### Parallel Cameras

Consider two cameras, where the right camera is shifted by baseline \(\displaystyle d\) along the x-axis compared to the left camera.

Then for a point \(\displaystyle (x,y,z)\),
\(\displaystyle x_l = \frac{x}{z}\)

\(\displaystyle y_l = \frac{y}{z}\)

\(\displaystyle x_r = \frac{x-d}{z}\)

\(\displaystyle y_r = \frac{y}{z}\).

Thus, the stereo disparity is the ratio of baseline over depth: \(\displaystyle x_l - x_r = \frac{d}{z}\).

With known baseline and correspondence, you can solve for depth \(\displaystyle z\).

### Epipolar Geometry

- Warp the two images such that the epipolar lines become horizontal.
- This is called rectification.

The *epipoles* are where one camera sees the other.

### Rectification

- Consider the left camera to be the center of a coordinate system.
- Let \(\displaystyle e_1\) be the axis to the right camera, \(\displaystyle e_2\) to be the up axis, and take \(\displaystyle e_3 = e_1 \times e_2\).

### Random dot stereograms

Shows that recognition is not needed for stereo.

### Similarity Construct

- Do matching by computing the sum of square differences (SSD) of a patch along the epipolar lines.
- The ordering of pixels along an epipolar line may not be the same between left and right images.

### Correspondence + Segmentation

- Assumption: Similar pixels in a segmentation map will probably have the same disparity.

- For each shift, find the connected components.
- For each point p, pick the largest connected component.

### Essential Matrix

The essential matrix satisfies \(\displaystyle \hat{p}' E \hat{p} = 0\) where \(\displaystyle \hat{p} = M^{-1}p\) and \(\displaystyle \hat{p}'=M'^{-1}p'\). The fundamental matrix is \(\displaystyle F=M'^{-T} E M^{-1}\).

- Properties

- The matrix is 3x3.
- If \(\displaystyle F\) is the essential matrix of (P, P') then \(\displaystyle F^T\) is the essential matrix of (P', P).
- The essential matrix can give you the equation of the epipolar line in the second image.
- \(\displaystyle l'=Fp\) and \(\displaystyle l=F^T p'\)

- For any p, the epipolar line \(\displaystyle l'=Fp\) contains the epipole \(\displaystyle e'\). This is since they come from the camera in the image.
- \(\displaystyle e'^T F = 0\) and \(\displaystyle Fe=0\)

## Structure from Motion

Optical Flow

### Only Translation

\(\displaystyle u = \frac{-V + xW}{Z} = \frac{W}{Z}(-\frac{U}{W} + x) = \frac{W}{Z}(x - x_0)\)

\(\displaystyle v = \frac{-V + \gamma W}{Z} = \frac{W}{Z}(-\frac{V}{W} + \gamma) = \frac{W}{Z}(y - y_0)\)

The direction of the translation is:

\(\displaystyle \frac{v}{u} = \frac{y-y_0}{x-x_0}\)

The all eminate from the focus of expansion.

If you walk towards a point in the image, then all pixels will flow away from that point.

### Only Rotation

Rotation around x axis: \(\displaystyle x = \alpha x y - \beta (1 + x^2) - \gamma y\)

Rotation around y or z axis leads to hyperbolas. The rotation is independent of depth.

### Both translation and rotation

The flow field will not resemble any of the above patterns.

### The velocity of p

### Moving plane

For a point on a plane p and a normal vector n, the set of all points on the plane is \(\displaystyle \{x | (x \cdot n) = d\}\) where \(\displaystyle d=(p \cdot n)\) is the distance to the plane from the origin along the normal vector.

### Scaling ambiguity

Depth can be recovered up to a scale factor.

### Non-Linear Least Squares Approach

Minimize the function: \(\displaystyle \sum [d^2 (p'Fp) + d^2 (pFp')] \)

### Locating the epipoles

## 3D Reconstruction

### Triangulation

If cameras are intrinsically and extrinsically calibrated, then P is the midpoint of the common perpendicular.

### Point reconstruction

Given a point X in R3

- \(\displaystyle x=MX\) is the point in image 1
- \(\displaystyle x'=M'X\) is the point in image 2

\(\displaystyle M = \begin{bmatrix} m_1^T \\ m_2^T \\ m_3^T \end{bmatrix} \)

\(\displaystyle x \times MX = 0\)

\(\displaystyle x \times M'X = 0\)

implies

\(\displaystyle AX=0\) where \(\displaystyle A = \begin{bmatrix}
x m_3^T - m_1^T\\
y m_3^T - m_2^T\\
x' m_3'^T - m_1'^T\\
y' m_3'^T - m_2'^T\\
\end{bmatrix}\)

### Reconstruction for intrinsically calibrated cameras

- Compute the essential matrix E using normalized points
- Select M=[I|0] M'=[R|T] then E=[T_x]R
- Find T and R using SVD of E.

### Reconstruction ambiguity: projective

\(\displaystyle x_h = MX_i = (MH_p^{-1})(H_P X_i)\)

- Moving the camera will get a different reconstruction even with the same image. The 3D model will be changed by some homography.
- If you know 5 points in 3D, you can rectify the 3D model.

- Projective Reconstruction Theorem

- We can compute a projective reconstruction of a scene from 2 views.
- We don't have to know the calibration or poses.

### Affine Reconstruction

## Aperture Problem

When looking through a small viewport (locally) at large objects, you cannot tell which direction it is moving.

See the barber pole illusion

### Brightness Constancy Equation

### Brightness Constraint Equation

Let \(\displaystyle E(x,y,t)\) be the irradiance and \(\displaystyle u(x,y),v(x,y)\) the components of optical flow.

Then \(\displaystyle E(x + u \delta t, y + v \delta t, t + \delta t) = E(x,y,t)\).

Assume \(\displaystyle E(x(y), y(t), t) = constant\)

## Structure from Motion Pipeline

### Calibration

- Step 1: Feature Matching

### Fundamental Matrix and Essential Matrix

- Step 2: Estimate Fundamental Matrix F
- \(\displaystyle x_i'^T F x_i = 0\)
- Use SVD to solve for x from \(\displaystyle Ax=0\): \(\displaystyle A=U \Sigma V^T\). The solution is the last singular vector of \(\displaystyle V\).
- Essential Matrix: \(\displaystyle E = K^T F K\)
**Fundamental matrix has 7 degrees of freedom, essential matrix has 5 degrees of freedom**

### Estimating Camera Pose

Estimating Camera Pose from E

Pose P has 6 DoF. Do SVD of the essential matrix to get 4 potential solutions.

You need to do triangulation to select from the 4 solutions.

## Visual Filters

Have filters which detect humans, cars,...

## Model-based Recognition

You have a model for each object to recognize.

The recognition system identifies objects from the model database.

### Pose Clustering

### Indexing

## Texture

### Synthesis

The goal is to generate additional texture samples from an existing texture sample.

### Filters

- Difference of Gradients (DoG)
- Gabor Filters

## Lecture Schedule

- 02/23/2021 - Pinhole camera model
- 02/25/2021 - Camera calibration
- 03/09/2021 - Optical flow, motion fields
- 03/11/2021 - Structure from motion: epipolar constraints, essential matrix, triangulation
- 03/25/2021 - Multiple topics (image motion)
- 03/30/2021 - Independent object motion (flow fields)
- 04/01/2021 - Project 3 Discussion
- 04/15/2021 - Shape from shading, reflectance map
- 04/20/2021 - Shape from shading, normal map
- 04/22/2021 - Recognition, classification
- 04/27/2021 - Visual filters, classification
- 04/29/2021 - Midterm Exam clarifications
- 05/04/2021 - Model-based Recognition
- 05/06/2021 - Texture