Essential Matrix: Difference between revisions
No edit summary Tag: visualeditor-switched |
|||
Line 60: | Line 60: | ||
==Determining rotation <math>\mathbf{R}</math> and translation <math>\mathbf{t}</math>== | ==Determining rotation <math>\mathbf{R}</math> and translation <math>\mathbf{t}</math>== | ||
See section 3 or Hartley<ref name="hartley"/>. | |||
;Theorem | ;Theorem |
Revision as of 14:49, 30 April 2020
An essential matrix, denoted \(\displaystyle \mathbf{E}\), is a \(\displaystyle 3 \times 3\) matrix relating camera parameters.
You can compute the essential matrix based on features matches between two images.
Using the essential matrix, you can extract the relative rotation and translation between two cameras.
Given feature points \(\displaystyle \mathbf{x}\) and \(\displaystyle \mathbf{x'}\) from two images, the essential matrix satisfies the equation \(\displaystyle \mathbf{x}'^T \mathbf{E} \mathbf{x} = 0\)
Much of this is from An Investigation of the Essential Matrix by Richard Hartley[1]
Background and Derivation
A pinhole camera with \(\displaystyle 3 \times 4\) projection matrix \(\displaystyle P = K(R | -RT)\) takes points \(\displaystyle \mathbf{x} = (x, y, z)^T\) and projects them to \(\displaystyle \mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})\).
We now consider two cameras:
Camera 1 is at the origin of world space (or it's object space) \(\displaystyle P = (I | 0)\).
Camera 2 is displaced with some rotation \(\displaystyle R\) and translation \(\displaystyle R\), \(\displaystyle P' = (R | -RT)\).
Any point \(\displaystyle \mathbf{u} = (u,v,w)^T\) in camera 1 is represented by an epipolar line in camera 2.
Under camera 2, the position of camera 1 is \(\displaystyle -RT\) and \(\displaystyle P' (u,v,w,0)^T = R\mathbf{u}\) is somewhere on this epipolar line.
Thus the line can be calculated by taking the cross product between the camera origin and \(\displaystyle \mathbf{u}\).
\[
(p,q,r)^T = RT \times R\mathbf{u} = R(T \times \mathbf{u}) = R[T]_{\times} \mathbf{u}
\]
Now the line is represented by \(\displaystyle \{(u',v',w') \mid pu' + qv' + rw' = 0\}\), i.e. all points orthogonal to \(\displaystyle (p,q,r)^T\).
Given a vector \(\displaystyle \mathbf{t}\), the matrix form of its cross product is:
\(\displaystyle
[\mathbf{t}]_{\times} =
\begin{bmatrix}
0 & -t_z & t_y\\
t_z & 0 & -t_x\\
-t_y & t_x & 0
\end{bmatrix}
\)
- \(\displaystyle [\mathbf{t}]_{\times} \mathbf{u} = \mathbf{t} \times \mathbf{u}\)
- This matrix is skew-symmetric. I.e. \(\displaystyle [\mathbf{t}]^T_{\times} = -[\mathbf{t}]_{\times}\)
Now if \(\displaystyle \mathbf{u}'\) is a feature point from camera 2 matching point \(\displaystyle \mathbf{u}\) from camera 1, then it must lie on this epipolar line.
Thus \(\displaystyle \mathbf{u}' \in \{(u',v',w') \mid pu' + qv' + rw' = 0\} \implies \mathbf{u}'^T R[T]_{\times} \mathbf{u} = 0\).
Now \(\displaystyle Q = R[T]_{\times}\) is the essential matrix.
Given 8 or more correspondence points between camera 1 and camera 2, you can solve for \(\displaystyle Q\) using the Wikipedia: Eight-point algorithm
Properties
- A \(\displaystyle 3 \times 3\) matrix is an essential matrix iff two of its singular values are equal and the third value is \(\displaystyle 0\)
See Bartoli and Olsen[2].
- The essential matrix is defined only up to a scale. I.e. if \(\displaystyle \mathbf{u}'^T Q \mathbf{u} = 0\) then \(\displaystyle \mathbf{u}'^T (\lambda Q) \mathbf{u} = 0\)
- To extract scale, you need to have an object of known size or know the distance between the cameras.
Calculating the Essential Matrix from two images
Planar Images
Spherical Images
Here we assume an equirectangular projection.
Determining rotation \(\displaystyle \mathbf{R}\) and translation \(\displaystyle \mathbf{t}\)
See section 3 or Hartley[1].
- Theorem
A \(\displaystyle 3 \times 3\) real matrix can be factored into a product of a rotation matrix \(\displaystyle R\) and a non-zero skew symmetric matrix \(\displaystyle S\) iff \(\displaystyle Q\) is two equal non-zero singular values and one zero singular value.
Let the singular value decomposition of our essential matrix \(\displaystyle Q\) be \(\displaystyle U D V^T\) where \(\displaystyle D = \operatorname{diag}(k, k, 0)\).
Let \(\displaystyle E = \begin{pmatrix}
0 & 1 & 0\\
-1 & 0 & 0\\
0 & 0 & 1
\end{pmatrix}\)
and
\(\displaystyle Z = \begin{pmatrix}
0 & -1 & 0\\
1 & 0 & 0 \\
0 & 0 & 0
\end{pmatrix}\)
Then we have the following:
- \(\displaystyle S = V Z V^T\) or \(\displaystyle -V Z V^T\)
- \(\displaystyle R = U E V^T\) or \(\displaystyle U E^T V^T\)
- \(\displaystyle Q = RS\)
- Notes
- Here, \(\displaystyle R\) is your rotation and \(\displaystyle S = [T]_{\times}\)
- \(\displaystyle T = V (0, 0, 1)^T\), the third column of \(\displaystyle V\) or third row of \(\displaystyle V^T\)
- Some sources such as Wikipedia use \(\displaystyle [T]_{\times} = U Z U^T\) and \(\displaystyle T = U (0, 0, 1)^T\).
- This is equivalent to \(\displaystyle RT\) in our notation.
- Since \(\displaystyle V\) is orthonormal, this gives \(\displaystyle \Vert T \Vert = 1\)
- Note \(\displaystyle -RT = -UEV^T V(0,0,1)^T = -U(0,0,1)^T\)
We have two possibilities for \(\displaystyle R\) and \(\displaystyle T\):
- \(\displaystyle R = U E V^T\) or \(\displaystyle U E^T V^T\)
- \(\displaystyle T = V (0, 0, 1)^T\) or \(\displaystyle -V (0, 0, 1)^T\)
This gives us 4 possibilities for \(\displaystyle P'\)
- \(\displaystyle P' = (UEV^T \mid -U(0,0,1)^T\)
- \(\displaystyle P' = (UEV^T \mid U(0,0,1)^T\)
- \(\displaystyle P' = (UE^TV^T \mid -U(0,0,1)^T\)
- \(\displaystyle P' = (UE^TV^T \mid U(0,0,1)^T\)
For planar images, only one of these 4 options is feasible. You can determine which one is feasibly using triangulation with one of your points. In the implausible 3 possibilities, \(\displaystyle P'\mathbf{u}\) will be out of bounds or negative
3D points
See Wikipedia: Essential_matrix
Resources
References
- ↑ 1.0 1.1 An Investigation of the Essential Matrix by Richard Hartley
- ↑ 3D Computer Vision Lecture 16 by Bartoli and Olsen