Essential Matrix: Difference between revisions
(30 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
the essential matrix satisfies the equation <math>\mathbf{x}'^T \mathbf{E} \mathbf{x} = 0</math> | the essential matrix satisfies the equation <math>\mathbf{x}'^T \mathbf{E} \mathbf{x} = 0</math> | ||
Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>. | |||
==Background and Derivation== | ==Background and Derivation== | ||
[[File: Epipolar_geometry.svg | link=Wikipedia | thumb | 400px | [[Wikipedia: Epipolar Geometriy]] ]] | [[File: Epipolar_geometry.svg | link=Wikipedia | thumb | 400px | [[Wikipedia: Epipolar Geometriy]] ]] | ||
A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R \mid -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>. Here, the notation <math>(R \mid -RT)</math> represents a <math>3 \times 3</math> matrix <math>R</math> concatenated with a <math>3 \times 1</math> matrix <math>-RT</math> to form a <math>3 \times 4</math> matrix. | |||
A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R | |||
We now consider two cameras: | We now consider two cameras: | ||
Camera 1 is at the origin of world space (or it's object space) <math>P = (I | 0)</math>. | Camera 1 is at the origin of world space (or it's object space) <math>P = (I | 0)</math>. | ||
Camera 2 is displaced with some rotation <math>R</math> and translation <math> | Camera 2 is displaced with some rotation <math>R</math> and translation <math>-RT</math>, <math>P' = (R | -RT)</math>.<br> | ||
Any point <math>\mathbf{u} = (u,v,w)^T</math> in camera 1 is represented by an epipolar line in camera 2.<br> | Any point <math>\mathbf{u} = (u,v,w)^T</math> in camera 1 is represented by an epipolar line in camera 2.<br> | ||
Under camera 2, the position of camera 1 is <math>-RT</math> and <math>P' (u,v,w,0)^T = R\mathbf{u}</math> is somewhere on this epipolar line. | Under camera 2, the position of camera 1 is <math>-RT</math> and <math>P' (u,v,w,0)^T = R\mathbf{u}</math> is somewhere on this epipolar line. | ||
Line 37: | Line 36: | ||
* This matrix is skew-symmetric. I.e. <math>[\mathbf{t}]^T_{\times} = -[\mathbf{t}]_{\times}</math> | * This matrix is skew-symmetric. I.e. <math>[\mathbf{t}]^T_{\times} = -[\mathbf{t}]_{\times}</math> | ||
Now if <math>\mathbf{u}'</math> is a feature point from camera 2 matching point <math>\mathbf{u}</math>, then it must lie on this epipolar line. | Now if <math>\mathbf{u}'</math> is a feature point from camera 2 matching point <math>\mathbf{u}</math> from camera 1, then it must lie on this epipolar line.<br> | ||
Thus <math>\mathbf{u}' \in \{(u',v',w') \mid pu' + qv' + rw' = 0\} \implies \mathbf{u}'^T R[T]_{\times} \mathbf{u} = 0</math>. | Thus <math>\mathbf{u}' \in \{(u',v',w') \mid pu' + qv' + rw' = 0\} \implies \mathbf{u}'^T R[T]_{\times} \mathbf{u} = 0</math>. | ||
Now <math>Q = R[T]_{\times}</math> is the essential matrix. | Now <math>Q = R[T]_{\times}</math> is the essential matrix. | ||
Line 48: | Line 47: | ||
See Bartoli and Olsen<ref name="lecture16">[http://igt.ip.uca.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf 3D Computer Vision Lecture 16 by Bartoli and Olsen]</ref>. | See Bartoli and Olsen<ref name="lecture16">[http://igt.ip.uca.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf 3D Computer Vision Lecture 16 by Bartoli and Olsen]</ref>. | ||
}} | }} | ||
* The essential matrix is defined only up to a scale. I.e. if <math>\mathbf{u}'^T Q \mathbf{u} = 0</math> then <math>\mathbf{u}'^T (\lambda Q) \mathbf{u} = 0</math> | |||
** To extract scale, you need to have an object of known size or know the distance between the cameras. | |||
==Calculating the Essential Matrix from two images== | ==Calculating the Essential Matrix from two images== | ||
== | {{main | Wikipedia:Eight-point algorithm}} | ||
Copied from section 5 of Hartley<ref name="hartley"/>. | |||
Here we will focus on calculating the essential matrix given 8 or more points. | |||
It is possible to calculate the essential matrix using 7 points using a non-linear equation if | |||
your correspondences are very accurate and are not linearly dependent. | |||
For this, see section 5.1 of Hartley<ref name="hartley"/>. | |||
We assume we have list of correspondences between the two images, <math>\{\mathbf{u}_i\}</math> and <math>\{\mathbf{u}_i'\}</math>.<br> | |||
This can be build by extracting features (e.g. ORB, SIFT, SURF) and creating matches.<br> | |||
Each feature is of the form <math>\mathbf{u}_i = (u, v, 1)</math>. | |||
Here, <math>u \in [0, W]</math> and <math>v \in [0, H]</math> are pixel positions within the image. | |||
Ideally, each set of correspondences should be independently normalized such that the origin of each set is the centroid of all points and the mean distance is <math>\sqrt{2}</math>. | |||
This can be done with a single matrix for each set of points. | |||
=== | For each correspondence <math>\mathbf{u}_i</math> and <math>\mathbf{u}'_i</math>., | ||
we get the equation <math>\mathbf{u}_i'^T Q \mathbf{u}_i = 0</math>. | |||
This system of equations is linear in the entries of <math>Q</math> and can be | |||
rewritten as <math>A\mathbf{x} = 0</math> where <math>\mathbf{x}</math> contains the entries of <math>Q</math>.<br> | |||
Here, | |||
<math>\mathbf{x} = \begin{pmatrix} | |||
q_{11} \\ q_{12} \\ q_{13} \\ | |||
q_{21} \\ q_{22} \\ q_{23} \\ | |||
q_{31} \\ q_{32} \\ q_{33} \\ | |||
\end{pmatrix}</math> | |||
and each row of <math>A</math> is | |||
<math>\mathbf{a_i} = \begin{pmatrix} | |||
u'_1 u_1 \\ u'_1 u_2 \\ u'_1 \\ | |||
u'_2 u_1 \\ u'_2 u_2 \\ u'_2 \\ | |||
u'_3 u_1 \\ u'_3 u_2 \\ 1 \\ | |||
\end{pmatrix} | |||
</math><br> | |||
Here <math>A</math> is an <math>n \times 9</math> matrix (where <math>n=8</math> if using 8 points). | |||
The goal is to minimize <math>\Vert A\mathbf{x} \Vert </math> such that <math>\Vert \mathbf{x} \Vert = 1</math> | |||
;Solution | |||
* First take the SVD of A: <math>A = UDV^T</math> | |||
** <math>U</math> is <math>8 \times 8</math>, <math>D</math> is <math>8 \times 9</math> diagonal matrix, and <math>V^T</math> is a <math>9 \times 9</math> matrix. | |||
* Now <math>x = V_j</math>, the <math>j</math>-th column of <math>V</math>. Reshape this to get <math>Q_{est}</math>. | |||
* In practice, this may not be rank 2 so we take the another SVD <math>Q_{est}=U diag(r,s,t) V^T</math> | |||
* Zero out the third singular value to get a final estimate | |||
*: <math>Q' = U diag(r,s,0) V^T</math> | |||
==Determining rotation <math>\mathbf{R}</math> and translation <math>\mathbf{t}</math>== | ==Determining rotation <math>\mathbf{R}</math> and translation <math>\mathbf{t}</math>== | ||
Copied from section 3 or Hartley<ref name="hartley"/>. | |||
;Theorem | |||
A <math>3 \times 3</math> real matrix can be factored into a product of a rotation matrix <math>R</math> and a non-zero skew symmetric matrix <math>S</math> iff <math>Q</math> is two equal non-zero singular values and one zero singular value. | |||
Let the singular value decomposition of our essential matrix <math>Q</math> be <math>U D V^T</math> where <math>D = \operatorname{diag}(k, k, 0)</math>.<br> | |||
Let <math>E = \begin{pmatrix} | |||
0 & 1 & 0\\ | |||
-1 & 0 & 0\\ | |||
0 & 0 & 1 | |||
\end{pmatrix}</math> | |||
and | |||
<math>Z = \begin{pmatrix} | |||
0 & -1 & 0\\ | |||
1 & 0 & 0 \\ | |||
0 & 0 & 0 | |||
\end{pmatrix}</math><br> | |||
Then we have the following: | |||
* <math>S = V Z V^T</math> or <math>-V Z V^T</math> | |||
* <math>R = U E V^T</math> or <math>U E^T V^T</math> | |||
* <math>Q = RS</math> | |||
;Notes | |||
* Here, <math>R</math> is your rotation and <math>S = [T]_{\times}</math> | |||
* <math>T = V (0, 0, 1)^T</math>, the third column of <math>V</math> or third row of <math>V^T</math> | |||
** Note that this only gives you the direction of the translation. The magnitude is not determined. | |||
* Some sources such as Wikipedia use <math>[T]_{\times} = U Z U^T</math> and <math>T = U (0, 0, 1)^T</math>. | |||
** This is equivalent to <math>RT</math> in our notation. | |||
* Since <math>V</math> is orthonormal, this gives <math>\Vert T \Vert = 1</math> | |||
* Note <math>-RT = -UEV^T V(0,0,1)^T = -U(0,0,1)^T</math> | |||
We have two possibilities for <math>R</math> and <math>T</math>: | |||
* <math>R = U E V^T</math> or <math>U E^T V^T</math> | |||
* <math>T = V (0, 0, 1)^T</math> or <math>-V (0, 0, 1)^T</math> | |||
This gives us 4 possibilities for <math>P'</math> | |||
* <math>P' = (UEV^T \mid -U(0,0,1)^T</math> | |||
* <math>P' = (UEV^T \mid U(0,0,1)^T</math> | |||
* <math>P' = (UE^TV^T \mid -U(0,0,1)^T</math> | |||
* <math>P' = (UE^TV^T \mid U(0,0,1)^T</math> | |||
For planar images, only one of these 4 options is feasible. | |||
You can determine which one is feasibly using triangulation with one of your points. | |||
==3D points== | ==3D points== | ||
See [[Wikipedia: Essential_matrix]] | See [[Wikipedia: Essential_matrix]] | ||
==Fundamental Matrix== | |||
The fundamental matrix is a generalization of the essential matrix which also takes into account the calibration of the camera. | |||
==Resources== | ==Resources== | ||
* [[Wikipedia: Essential_matrix]] | * [[Wikipedia: Essential_matrix]] | ||
* [http://robotics.stanford.edu/~birch/projective/node20.html stanford essential and fundamental matricies] | |||
* [https://github.com/darknight1900/books/blob/master/Multiple%20View%20Geometry%20in%20Computer%20Vision%20(Second%20Edition).pdf Multiple View Geometry in Computer Vision by Hartley and Zisserman] | |||
==References== | ==References== |