# Geometric Computer Vision

$$\newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits}$$

Notes for CMSC733 Classical and Deep Learning Approaches for Geometric Computer Vision taught by Prof. Yiannis Aloimonos.

## Convolution and Correlation

See Convolutional neural network.

## Edge Detection

Two ways to detect edges:

• Difference operators
• Models

• Angle is given by $$\displaystyle \theta = \arctan(\frac{\partial f}{\partial y}, \frac{\partial f}{\partial x})$$
• Edge strength is given by $$\displaystyle \left\Vert (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}) \right\Vert$$

Sobel operator is another way to approximate derivatives:
$$\displaystyle s_x = \frac{1}{8} \begin{bmatrix} -1 & 0 & 1\\ -2 & 0 & 2\\ -1 & 0 & 1 \end{bmatrix}$$ and $$\displaystyle s_y = \frac{1}{8} \begin{bmatrix} 1 & 2 & 1\\ 0 & 0 & 0\\ -1 & -2 & -1 \end{bmatrix}$$

You can smooth a function by convolving with a Gaussian kernel.

Laplacian of Gaussian
• Edges are zero crossings of the Laplacian of Gaussian convolved with the signal.

Effect of $$\displaystyle \sigma$$ Gaussian kernel size:

• Large sigma detects large scale edges.
• Small sigma detects fine features.
Scale Space
• With larger sigma, the first derivative peaks (i.e. zero crossings) can move.
• Close-by peaks can also merge as the scale increases.
• An edge will never split.

### Subtraction

• Create a smoothed image by convolving with a Gaussian
• Subtract the smoothed image from the original image.

### Finding lines in an image

Option 1: Search for line everywhere.
Option 2: Use Hough transform voting.

### Hough Transform

Duality between lines in image space and points in Hough space.
Equation for a line in $$\displaystyle d = x \cos \theta + y \sin \theta$$.

for all pixels (x,y) on an edge:
for all (d, theta):
if d = x*cos(theta) + y*sin(theta):
H(d, theta) += 1
d, theta = argmax(H)

• Hough transform handles noise better than least squares.
• Each pixel votes for a line in the Hough space. The line in the image space is the intersection of lines in the Hough space.
Extensions
• Give more votes for stronger edges
• Change sampling to give more/less resolution
• Same procedure with circles, squares, or other shapes.
Hough transform for curves

Works with any curve that can be written in a parametric form.

### Finding corners

$$\displaystyle C = \begin{bmatrix} \sum I_x^2 & \sum I_x I_y\\ \sum I_x I_y & \sum I_y^2 \end{bmatrix}$$

Consider $$\displaystyle C = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}$$

### Theoretical model of an eye

• Pick a point in space and the light rays passing through it.
• Pinhole cameras
• Abstractly, a box with a small hole in it.

## Homography

### Solving for homographies

Given 4 correspondences, you can solve for a homography.

### Point and line duality

Points on the image correspond to lines/rays in 3D space.
The cross product of these correspond to a plane.

## Calibration

### Central Projection

$$\displaystyle \begin{bmatrix} u \\ v \\ w \end{bmatrix} = \begin{bmatrix} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x_s \\ y _s \\ z_s \\ 1 \end{bmatrix}$$

### Properties of matrix P

$$\displaystyle P = K R [I_3 | -C]$$

• $$\displaystyle K$$ is the upper-triangular calibration matrix which has 5 degrees of freedom.
• $$\displaystyle R$$ is the rotation matrix with 3 degrees of freedom.
• $$\displaystyle C$$ is the camera center with 3 degrees of freedom.

### Calibration

1. Estimate matrix P using scene points and images.
2. Estimate interior parameters and exterior parameters.

## Stereo

### Parallel Cameras

Consider two cameras, where the right camera is shifted by baseline $$\displaystyle d$$ along the x-axis compared to the left camera.
Then for a point $$\displaystyle (x,y,z)$$, $$\displaystyle x_l = \frac{x}{z}$$
$$\displaystyle y_l = \frac{y}{z}$$
$$\displaystyle x_r = \frac{x-d}{z}$$
$$\displaystyle y_r = \frac{y}{z}$$.
Thus, the stereo disparity is the ratio of baseline over depth: $$\displaystyle x_l - x_r = \frac{d}{z}$$.
With known baseline and correspondence, you can solve for depth $$\displaystyle z$$.

### Epipolar Geometry

1. Warp the two images such that the epipolar lines become horizontal.
2. This is called rectification.

The epipoles are where one camera sees the other.

### Rectification

1. Consider the left camera to be the center of a coordinate system.
2. Let $$\displaystyle e_1$$ be the axis to the right camera, $$\displaystyle e_2$$ to be the up axis, and take $$\displaystyle e_3 = e_1 \times e_2$$.

### Random dot stereograms

Shows that recognition is not needed for stereo.

### Similarity Construct

• Do matching by computing the sum of square differences (SSD) of a patch along the epipolar lines.
• The ordering of pixels along an epipolar line may not be the same between left and right images.

### Correspondence + Segmentation

• Assumption: Similar pixels in a segmentation map will probably have the same disparity.
1. For each shift, find the connected components.
2. For each point p, pick the largest connected component.

### Essential Matrix

The essential matrix satisfies $$\displaystyle \hat{p}' E \hat{p} = 0$$ where $$\displaystyle \hat{p} = M^{-1}p$$ and $$\displaystyle \hat{p}'=M'^{-1}p'$$. The fundamental matrix is $$\displaystyle F=M'^{-T} E M^{-1}$$.

Properties
• The matrix is 3x3.
• If $$\displaystyle F$$ is the essential matrix of (P, P') then $$\displaystyle F^T$$ is the essential matrix of (P', P).
• The essential matrix can give you the equation of the epipolar line in the second image.
• $$\displaystyle l'=Fp$$ and $$\displaystyle l=F^T p'$$
• For any p, the epipolar line $$\displaystyle l'=Fp$$ contains the epipole $$\displaystyle e'$$. This is since they come from the camera in the image.
• $$\displaystyle e'^T F = 0$$ and $$\displaystyle Fe=0$$

## Structure from Motion

Optical Flow

### Only Translation

$$\displaystyle u = \frac{-V + xW}{Z} = \frac{W}{Z}(-\frac{U}{W} + x) = \frac{W}{Z}(x - x_0)$$
$$\displaystyle v = \frac{-V + \gamma W}{Z} = \frac{W}{Z}(-\frac{V}{W} + \gamma) = \frac{W}{Z}(y - y_0)$$

The direction of the translation is:
$$\displaystyle \frac{v}{u} = \frac{y-y_0}{x-x_0}$$
The all eminate from the focus of expansion.
If you walk towards a point in the image, then all pixels will flow away from that point.

### Only Rotation

Rotation around x axis: $$\displaystyle x = \alpha x y - \beta (1 + x^2) - \gamma y$$

Rotation around y or z axis leads to hyperbolas. The rotation is independent of depth.

### Both translation and rotation

The flow field will not resemble any of the above patterns.

### Moving plane

For a point on a plane p and a normal vector n, the set of all points on the plane is $$\displaystyle \{x | (x \cdot n) = d\}$$ where $$\displaystyle d=(p \cdot n)$$ is the distance to the plane from the origin along the normal vector.

### Scaling ambiguity

Depth can be recovered up to a scale factor.

### Non-Linear Least Squares Approach

Minimize the function: $$\displaystyle \sum [d^2 (p'Fp) + d^2 (pFp')]$$

## 3D Reconstruction

### Triangulation

If cameras are intrinsically and extrinsically calibrated, then P is the midpoint of the common perpendicular.

### Point reconstruction

Given a point X in R3

• $$\displaystyle x=MX$$ is the point in image 1
• $$\displaystyle x'=M'X$$ is the point in image 2

$$\displaystyle M = \begin{bmatrix} m_1^T \\ m_2^T \\ m_3^T \end{bmatrix}$$

$$\displaystyle x \times MX = 0$$
$$\displaystyle x \times M'X = 0$$
implies
$$\displaystyle AX=0$$ where $$\displaystyle A = \begin{bmatrix} x m_3^T - m_1^T\\ y m_3^T - m_2^T\\ x' m_3'^T - m_1'^T\\ y' m_3'^T - m_2'^T\\ \end{bmatrix}$$

### Reconstruction for intrinsically calibrated cameras

1. Compute the essential matrix E using normalized points
2. Select M=[I|0] M'=[R|T] then E=[T_x]R
3. Find T and R using SVD of E.

### Reconstruction ambiguity: projective

$$\displaystyle x_h = MX_i = (MH_p^{-1})(H_P X_i)$$

• Moving the camera will get a different reconstruction even with the same image. The 3D model will be changed by some homography.
• If you know 5 points in 3D, you can rectify the 3D model.
Projective Reconstruction Theorem
• We can compute a projective reconstruction of a scene from 2 views.
• We don't have to know the calibration or poses.

## Aperture Problem

When looking through a small viewport (locally) at large objects, you cannot tell which direction it is moving.
See the barber pole illusion

### Brightness Constraint Equation

Let $$\displaystyle E(x,y,t)$$ be the irradiance and $$\displaystyle u(x,y),v(x,y)$$ the components of optical flow.
Then $$\displaystyle E(x + u \delta t, y + v \delta t, t + \delta t) = E(x,y,t)$$.

Assume $$\displaystyle E(x(y), y(t), t) = constant$$

## Structure from Motion Pipeline

### Calibration

1. Step 1: Feature Matching

### Fundamental Matrix and Essential Matrix

1. Step 2: Estimate Fundamental Matrix F
• $$\displaystyle x_i'^T F x_i = 0$$
• Use SVD to solve for x from $$\displaystyle Ax=0$$: $$\displaystyle A=U \Sigma V^T$$. The solution is the last singular vector of $$\displaystyle V$$.
• Essential Matrix: $$\displaystyle E = K^T F K$$
• Fundamental matrix has 7 degrees of freedom, essential matrix has 5 degrees of freedom

### Estimating Camera Pose

Estimating Camera Pose from E
Pose P has 6 DoF. Do SVD of the essential matrix to get 4 potential solutions.
You need to do triangulation to select from the 4 solutions.

## Visual Filters

Have filters which detect humans, cars,...

## Model-based Recognition

You have a model for each object to recognize.
The recognition system identifies objects from the model database.

## Texture

### Synthesis

The goal is to generate additional texture samples from an existing texture sample.

### Filters

• Gabor Filters

## Lecture Schedule

• 02/23/2021 - Pinhole camera model
• 02/25/2021 - Camera calibration
• 03/09/2021 - Optical flow, motion fields
• 03/11/2021 - Structure from motion: epipolar constraints, essential matrix, triangulation
• 03/25/2021 - Multiple topics (image motion)
• 03/30/2021 - Independent object motion (flow fields)
• 04/01/2021 - Project 3 Discussion
• 04/15/2021 - Shape from shading, reflectance map
• 04/20/2021 - Shape from shading, normal map
• 04/22/2021 - Recognition, classification
• 04/27/2021 - Visual filters, classification
• 04/29/2021 - Midterm Exam clarifications
• 05/04/2021 - Model-based Recognition
• 05/06/2021 - Texture