Geometric Computer Vision

Notes for CMSC733 Classical and Deep Learning Approaches for Geometric Computer Vision taught by Prof. Yiannis Aloimonos.

Course webpage

Convolution and Correlation

See Convolutional neural network.
Traditionally, fixed filters are used instead of learned filters.

Edge Detection

Two ways to detect edges:

Difference operators
Models

Image Gradients

Angle is given by \(\displaystyle \theta = \arctan(\frac{\partial f}{\partial y}, \frac{\partial f}{\partial x})\)
Edge strength is given by \(\displaystyle \left\Vert (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}) \right\Vert\)

Sobel operator is another way to approximate derivatives:
\(\displaystyle s_x = \frac{1}{8} \begin{bmatrix} -1 & 0 & 1\\ -2 & 0 & 2\\ -1 & 0 & 1 \end{bmatrix} \) and \(\displaystyle s_y = \frac{1}{8} \begin{bmatrix} 1 & 2 & 1\\ 0 & 0 & 0\\ -1 & -2 & -1 \end{bmatrix} \)

You can smooth a function by convolving with a Gaussian kernel.

Laplacian of Gaussian

Edges are zero crossings of the Laplacian of Gaussian convolved with the signal.

Effect of \(\displaystyle \sigma\) Gaussian kernel size:

Large sigma detects large scale edges.
Small sigma detects fine features.

Scale Space

With larger sigma, the first derivative peaks (i.e. zero crossings) can move.
Close-by peaks can also merge as the scale increases.
An edge will never split.

Subtraction

Create a smoothed image by convolving with a Gaussian
Subtract the smoothed image from the original image.

Finding lines in an image

Option 1: Search for line everywhere.
Option 2: Use Hough transform voting.

Hough Transform

Duality between lines in image space and points in Hough space.
Equation for a line in \(\displaystyle d = x \cos \theta + y \sin \theta\).

for all pixels (x,y) on an edge:
  for all (d, theta):
    if d = x*cos(theta) + y*sin(theta):
       H(d, theta) += 1
d, theta = argmax(H)

Hough transform handles noise better than least squares.
Each pixel votes for a line in the Hough space. The line in the image space is the intersection of lines in the Hough space.

Extensions

Use image gradient.
Give more votes for stronger edges
Change sampling to give more/less resolution
Same procedure with circles, squares, or other shapes.

Hough transform for curves

Works with any curve that can be written in a parametric form.

Finding corners

\(\displaystyle C = \begin{bmatrix} \sum I_x^2 & \sum I_x I_y\\ \sum I_x I_y & \sum I_y^2 \end{bmatrix} \)

Consider \(\displaystyle C = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} \)

Theoretical model of an eye

Pick a point in space and the light rays passing through it.
Pinhole cameras
- Abstractly, a box with a small hole in it.

Homography

Cross-ratio

See Wikipedia: Cross-ratio.

Solving for homographies

Given 4 correspondences, you can solve for a homography.

Point and line duality

Points on the image correspond to lines/rays in 3D space.
The cross product of these correspond to a plane.

Calibration

Central Projection

\(\displaystyle \begin{bmatrix} u \\ v \\ w \end{bmatrix} = \begin{bmatrix} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x_s \\ y _s \\ z_s \\ 1 \end{bmatrix} \)

Properties of matrix P

\(\displaystyle P = K R [I_3 | -C]\)

\(\displaystyle K\) is the upper-triangular calibration matrix which has 5 degrees of freedom.
\(\displaystyle R\) is the rotation matrix with 3 degrees of freedom.
\(\displaystyle C\) is the camera center with 3 degrees of freedom.

Calibration

Estimate matrix P using scene points and images.
Estimate interior parameters and exterior parameters.

Zhang's Approach

Stereo

Parallel Cameras

Consider two cameras, where the right camera is shifted by baseline \(\displaystyle d\) along the x-axis compared to the left camera.
Then for a point \(\displaystyle (x,y,z)\), \(\displaystyle x_l = \frac{x}{z}\)
\(\displaystyle y_l = \frac{y}{z}\)
\(\displaystyle x_r = \frac{x-d}{z}\)
\(\displaystyle y_r = \frac{y}{z}\).
Thus, the stereo disparity is the ratio of baseline over depth: \(\displaystyle x_l - x_r = \frac{d}{z}\).
With known baseline and correspondence, you can solve for depth \(\displaystyle z\).

Epipolar Geometry

Warp the two images such that the epipolar lines become horizontal.
This is called rectification.

Rectification

Consider the left camera to be the center of a coordinate system.
Let \(\displaystyle e_1\) be the axis to the right camera, \(\displaystyle e_2\) to be the up axis, and take \(\displaystyle e_3 = e_1 \times e_2\).

Convolution and Correlation

Edge Detection

Image Gradients

Subtraction

Finding lines in an image

Hough Transform

Finding corners

Theoretical model of an eye

Homography

Cross-ratio

Solving for homographies

Point and line duality

Calibration

Central Projection

Properties of matrix P

Calibration

Zhang's Approach

Stereo

Parallel Cameras

Epipolar Geometry

Rectification

Projects