Essential Matrix: Difference between revisions

(11 intermediate revisions by the same user not shown)

Line 6:

the essential matrix satisfies the equation <math>\mathbf{x}'^T \mathbf{E} \mathbf{x} = 0</math>

Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>

Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>.

==Background and Derivation==

[[File: Epipolar_geometry.svg | link=Wikipedia | thumb | 400px | [[Wikipedia: Epipolar Geometriy]] ]]

A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R | -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>.

A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R \mid -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>. Here, the notation <math>(R \mid -RT)</math> represents a <math>3 \times 3</math> matrix <math>R</math> concatenated with a <math>3 \times 1</math> matrix <math>-RT</math> to form a <math>3 \times 4</math> matrix.

We now consider two cameras:

Camera 1 is at the origin of world space (or it's object space) <math>P = (I | 0)</math>.

Camera 2 is displaced with some rotation <math>R</math> and translation <math>R</math>, <math>P' = (R | -RT)</math>.<br>

Camera 2 is displaced with some rotation <math>R</math> and translation <math>-RT</math>, <math>P' = (R | -RT)</math>.<br>

Any point <math>\mathbf{u} = (u,v,w)^T</math> in camera 1 is represented by an epipolar line in camera 2.<br>

Under camera 2, the position of camera 1 is <math>-RT</math> and <math>P' (u,v,w,0)^T = R\mathbf{u}</math> is somewhere on this epipolar line.

Line 84:

u'_3 u_1 \\ u'_3 u_2 \\ 1 \\

\end{pmatrix}

</math>

</math><br>

Here <math>A</math> is an <math>n \times 9</math> matrix (where <math>n=8</math> if using 8 points).

The goal is to minimize <math>\Vert A\mathbf{x} \Vert </math> such that <math>\Vert \mathbf{x} \Vert = 1</math>

Line 90:

Line 91:

;Solution

* First take the SVD of A: <math>A = UDV^T</math>

** <math>U</math> is <math>8 \times 8</math>, <math>D</math> is <math>8 \times 9</math> diagonal matrix, and <math>V^T</math> is a <math>9 \times 9</math> matrix.

* Now <math>x = V_j</math>, the <math>j</math>-th column of <math>V</math>. Reshape this to get <math>Q_{est}</math>.

* In practice, this may not be rank 2 so we take the SVD <math>Q_{est}=U diag(r,s,t) V^T</math> ~~and zero~~ out the third singular value to get a final estimate

* In practice, this may not be rank 2 so we take the another SVD <math>Q_{est}=U diag(r,s,t) V^T</math>

* Zero out the third singular value to get a final estimate

*: <math>Q' = U diag(r,s,0) V^T</math>

Line 120:

Line 123:

* Here, <math>R</math> is your rotation and <math>S = [T]_{\times}</math>

* <math>T = V (0, 0, 1)^T</math>, the third column of <math>V</math> or third row of <math>V^T</math>

** Note that this only gives you the direction of the translation. The magnitude is not determined.

* Some sources such as Wikipedia use <math>[T]_{\times} = U Z U^T</math> and <math>T = U (0, 0, 1)^T</math>.

** This is equivalent to <math>RT</math> in our notation.

Line 137:

Line 141:

For planar images, only one of these 4 options is feasible.

You can determine which one is feasibly using triangulation with one of your points.

~~In the implausible 3 possibilities, <math>P'\mathbf{u}</math> will be out of bounds or negative~~

==3D points==

See [[Wikipedia: Essential_matrix]]

==Fundamental Matrix==

The fundamental matrix is a generalization of the essential matrix which also takes into account the calibration of the camera.

==Resources==

* [[Wikipedia: Essential_matrix]]

* [http://robotics.stanford.edu/~birch/projective/node20.html stanford essential and fundamental matricies]

* [https://github.com/darknight1900/books/blob/master/Multiple%20View%20Geometry%20in%20Computer%20Vision%20(Second%20Edition).pdf Multiple View Geometry in Computer Vision by Hartley and Zisserman]

==References==

@@ Line 6: / Line 6: @@
 the essential matrix satisfies the equation <math>\mathbf{x}'^T \mathbf{E} \mathbf{x} = 0</math>
-Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>
+Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>.
 ==Background and Derivation==
 [[File: Epipolar_geometry.svg | link=Wikipedia | thumb | 400px | [[Wikipedia: Epipolar Geometriy]] ]]
-A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R | -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>.
+A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R \mid -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>. Here, the notation <math>(R \mid -RT)</math> represents a <math>3 \times 3</math> matrix <math>R</math> concatenated with a <math>3 \times 1</math> matrix <math>-RT</math> to form a <math>3 \times 4</math> matrix.
 We now consider two cameras:
 Camera 1 is at the origin of world space (or it's object space) <math>P = (I | 0)</math>.
-Camera 2 is displaced with some rotation <math>R</math> and translation <math>R</math>, <math>P' = (R | -RT)</math>.<br>
+Camera 2 is displaced with some rotation <math>R</math> and translation <math>-RT</math>, <math>P' = (R | -RT)</math>.<br>
 Any point <math>\mathbf{u} = (u,v,w)^T</math> in camera 1 is represented by an epipolar line in camera 2.<br>
 Under camera 2, the position of camera 1 is <math>-RT</math> and <math>P' (u,v,w,0)^T = R\mathbf{u}</math> is somewhere on this epipolar line.
@@ Line 84: / Line 84: @@
 u'_3 u_1 \\ u'_3 u_2 \\ 1 \\
 \end{pmatrix}
-</math>
+</math><br>
+Here <math>A</math> is an <math>n \times 9</math> matrix (where <math>n=8</math> if using 8 points).
 The goal is to minimize <math>\Vert A\mathbf{x} \Vert </math> such that <math>\Vert \mathbf{x} \Vert = 1</math>
@@ Line 90: / Line 91: @@
 ;Solution
 * First take the SVD of A: <math>A = UDV^T</math>
+** <math>U</math> is <math>8 \times 8</math>, <math>D</math> is <math>8 \times 9</math> diagonal matrix, and <math>V^T</math> is a <math>9 \times 9</math> matrix.
 * Now <math>x = V_j</math>, the <math>j</math>-th column of <math>V</math>. Reshape this to get <math>Q_{est}</math>.
-* In practice, this may not be rank 2 so we take the SVD <math>Q_{est}=U diag(r,s,t) V^T</math> and zero out the third singular value to get a final estimate
+* In practice, this may not be rank 2 so we take the another SVD <math>Q_{est}=U diag(r,s,t) V^T</math>
+* Zero out the third singular value to get a final estimate
 *: <math>Q' = U diag(r,s,0) V^T</math>
@@ Line 120: / Line 123: @@
 * Here, <math>R</math> is your rotation and <math>S = [T]_{\times}</math>
 * <math>T = V (0, 0, 1)^T</math>, the third column of <math>V</math> or third row of <math>V^T</math>
+** Note that this only gives you the direction of the translation. The magnitude is not determined.
 * Some sources such as Wikipedia use <math>[T]_{\times} = U Z U^T</math> and <math>T = U (0, 0, 1)^T</math>.
 ** This is equivalent to <math>RT</math> in our notation.
@@ Line 137: / Line 141: @@
 For planar images, only one of these 4 options is feasible.
 You can determine which one is feasibly using triangulation with one of your points.
-In the implausible 3 possibilities, <math>P'\mathbf{u}</math> will be out of bounds or negative
 ==3D points==
 See [[Wikipedia: Essential_matrix]]
+==Fundamental Matrix==
+The fundamental matrix is a generalization of the essential matrix which also takes into account the calibration of the camera.
 ==Resources==
 * [[Wikipedia: Essential_matrix]]
+* [http://robotics.stanford.edu/~birch/projective/node20.html stanford essential and fundamental matricies]
+* [https://github.com/darknight1900/books/blob/master/Multiple%20View%20Geometry%20in%20Computer%20Vision%20(Second%20Edition).pdf Multiple View Geometry in Computer Vision by Hartley and Zisserman]
 ==References==