Essential Matrix: Difference between revisions

(18 intermediate revisions by the same user not shown)

Line 6:

the essential matrix satisfies the equation <math>\mathbf{x}'^T \mathbf{E} \mathbf{x} = 0</math>

Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>

Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>.

==Background and Derivation==

[[File: Epipolar_geometry.svg | link=Wikipedia | thumb | 400px | [[Wikipedia: Epipolar Geometriy]] ]]

A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R | -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>.

A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R \mid -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>. Here, the notation <math>(R \mid -RT)</math> represents a <math>3 \times 3</math> matrix <math>R</math> concatenated with a <math>3 \times 1</math> matrix <math>-RT</math> to form a <math>3 \times 4</math> matrix.

We now consider two cameras:

Camera 1 is at the origin of world space (or it's object space) <math>P = (I | 0)</math>.

Camera 2 is displaced with some rotation <math>R</math> and translation <math>R</math>, <math>P' = (R | -RT)</math>.

Camera 2 is displaced with some rotation <math>R</math> and translation <math>-RT</math>, <math>P' = (R | -RT)</math>.

Any point <math>\mathbf{u} = (u,v,w)^T</math> in camera 1 is represented by an epipolar line in camera 2.

Under camera 2, the position of camera 1 is <math>-RT</math> and <math>P' (u,v,w,0)^T = R\mathbf{u}</math> is somewhere on this epipolar line.

Line 53:

=~~==Planar Images===~~

Copied from section 5 of Hartley<ref name="hartley"/>.

Here we will focus on calculating the essential matrix given 8 or more points.

It is possible to calculate the essential matrix using 7 points using a non-linear equation if

your correspondences are very accurate and are not linearly dependent.

For this, see section 5.1 of Hartley<ref name="hartley"/>.

===~~Spherical Images=~~==

We assume we have list of correspondences between the two images, <math>\{\mathbf{u}_i\}</math> and <math>\{\mathbf{u}_i'\}</math>.

Here ~~we assume~~ an ~~equirectangular projection~~.

This can be build by extracting features (e.g. ORB, SIFT, SURF) and creating matches.

Each feature is of the form <math>\mathbf{u}_i = (u, v, 1)</math>.

Here, <math>u \in [0, W]</math> and <math>v \in [0, H]</math> are pixel positions within the image.

Ideally, each set of correspondences should be independently normalized such that the origin of each set is the centroid of all points and the mean distance is <math>\sqrt{2}</math>.

This can be done with a single matrix for each set of points.

For each correspondence <math>\mathbf{u}_i</math> and <math>\mathbf{u}'_i</math>.,

we get the equation <math>\mathbf{u}_i'^T Q \mathbf{u}_i = 0</math>.

This system of equations is linear in the entries of <math>Q</math> and can be

rewritten as <math>A\mathbf{x} = 0</math> where <math>\mathbf{x}</math> contains the entries of <math>Q</math>.

Here,

<math>\mathbf{x} = \begin{pmatrix}

q_{11} \\ q_{12} \\ q_{13} \\

q_{21} \\ q_{22} \\ q_{23} \\

q_{31} \\ q_{32} \\ q_{33} \\

\end{pmatrix}</math>

and each row of <math>A</math> is

<math>\mathbf{a_i} = \begin{pmatrix}

u'_1 u_1 \\ u'_1 u_2 \\ u'_1 \\

u'_2 u_1 \\ u'_2 u_2 \\ u'_2 \\

u'_3 u_1 \\ u'_3 u_2 \\ 1 \\

\end{pmatrix}

</math>

Here <math>A</math> is an <math>n \times 9</math> matrix (where <math>n=8</math> if using 8 points).

The goal is to minimize <math>\Vert A\mathbf{x} \Vert </math> such that <math>\Vert \mathbf{x} \Vert = 1</math>

;Solution

* First take the SVD of A: <math>A = UDV^T</math>

** <math>U</math> is <math>8 \times 8</math>, <math>D</math> is <math>8 \times 9</math> diagonal matrix, and <math>V^T</math> is a <math>9 \times 9</math> matrix.

* Now <math>x = V_j</math>, the <math>j</math>-th column of <math>V</math>. Reshape this to get <math>Q_{est}</math>.

* In practice, this may not be rank 2 so we take the another SVD <math>Q_{est}=U diag(r,s,t) V^T</math>

* Zero out the third singular value to get a final estimate

*: <math>Q' = U diag(r,s,0) V^T</math>

==Determining rotation <math>\mathbf{R}</math> and translation <math>\mathbf{t}</math>==

Copied from section 3 or Hartley<ref name="hartley"/>.

;Theorem

Line 84:

Line 123:

* Here, <math>R</math> is your rotation and <math>S = [T]_{\times}</math>

* <math>T = V (0, 0, 1)^T</math>, the third column of <math>V</math> or third row of <math>V^T</math>

** Note that this only gives you the direction of the translation. The magnitude is not determined.

* Some sources such as Wikipedia use <math>[T]_{\times} = U Z U^T</math> and <math>T = U (0, 0, 1)^T</math>.

** This is equivalent to <math>RT</math> in our notation.

* Since <math>V</math> is orthonormal, this ~~give~~ <math>\Vert T \Vert = 1</math>

* Since <math>V</math> is orthonormal, this gives <math>\Vert T \Vert = 1</math>

* Note <math>-RT = -UEV^T V(0,0,1)^T = -U(0,0,1)^T</math>

Line 101:

Line 141:

For planar images, only one of these 4 options is feasible.

You can determine which one is feasibly using triangulation with one of your points.

~~In the implausible 3 possibilities, <math>P'\mathbf{u}</math> will be out of bounds or negative~~

==3D points==

See [[Wikipedia: Essential_matrix]]

==Fundamental Matrix==

The fundamental matrix is a generalization of the essential matrix which also takes into account the calibration of the camera.

==Resources==

* [[Wikipedia: Essential_matrix]]

* [http://robotics.stanford.edu/~birch/projective/node20.html stanford essential and fundamental matricies]

* [https://github.com/darknight1900/books/blob/master/Multiple%20View%20Geometry%20in%20Computer%20Vision%20(Second%20Edition).pdf Multiple View Geometry in Computer Vision by Hartley and Zisserman]

==References==

@@ Line 6: / Line 6: @@
 the essential matrix satisfies the equation <math>\mathbf{x}'^T \mathbf{E} \mathbf{x} = 0</math>
-Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>
+Much of this is from [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley<ref name="hartley">[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7518 An Investigation of the Essential Matrix] by Richard Hartley</ref>.
 ==Background and Derivation==
 [[File: Epipolar_geometry.svg | link=Wikipedia | thumb | 400px | [[Wikipedia: Epipolar Geometriy]] ]]
-A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R | -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>.
+A pinhole camera with <math>3 \times 4</math> projection matrix <math>P = K(R \mid -RT)</math> takes points <math>\mathbf{x} = (x, y, z)^T</math> and projects them to <math>\mathbf{u} = (u, v, w)^T = \mathbf{R}(\mathbf{x} - \mathbf{t})</math>. Here, the notation <math>(R \mid -RT)</math> represents a <math>3 \times 3</math> matrix <math>R</math> concatenated with a <math>3 \times 1</math> matrix <math>-RT</math> to form a <math>3 \times 4</math> matrix.
 We now consider two cameras:
 Camera 1 is at the origin of world space (or it's object space) <math>P = (I | 0)</math>.
-Camera 2 is displaced with some rotation <math>R</math> and translation <math>R</math>, <math>P' = (R | -RT)</math>.<br>
+Camera 2 is displaced with some rotation <math>R</math> and translation <math>-RT</math>, <math>P' = (R | -RT)</math>.<br>
 Any point <math>\mathbf{u} = (u,v,w)^T</math> in camera 1 is represented by an epipolar line in camera 2.<br>
 Under camera 2, the position of camera 1 is <math>-RT</math> and <math>P' (u,v,w,0)^T = R\mathbf{u}</math> is somewhere on this epipolar line.
@@ Line 53: / Line 53: @@
 {{main | Wikipedia:Eight-point algorithm}}
-===Planar Images===
+Copied from section 5 of Hartley<ref name="hartley"/>.
+Here we will focus on calculating the essential matrix given 8 or more points.
+It is possible to calculate the essential matrix using 7 points using a non-linear equation if
+your correspondences are very accurate and are not linearly dependent.
+For this, see section 5.1 of Hartley<ref name="hartley"/>.
-===Spherical Images===
+We assume we have list of correspondences between the two images, <math>\{\mathbf{u}_i\}</math> and <math>\{\mathbf{u}_i'\}</math>.<br>
-Here we assume an equirectangular projection.
+This can be build by extracting features (e.g. ORB, SIFT, SURF) and creating matches.<br>
+Each feature is of the form <math>\mathbf{u}_i = (u, v, 1)</math>.
+Here, <math>u \in [0, W]</math> and <math>v \in [0, H]</math> are pixel positions within the image.
+Ideally, each set of correspondences should be independently normalized such that the origin of each set is the centroid of all points and the mean distance is <math>\sqrt{2}</math>.
+This can be done with a single matrix for each set of points.
+For each correspondence <math>\mathbf{u}_i</math> and <math>\mathbf{u}'_i</math>.,
+we get the equation <math>\mathbf{u}_i'^T Q \mathbf{u}_i = 0</math>.
+This system of equations is linear in the entries of <math>Q</math> and can be
+rewritten as <math>A\mathbf{x} = 0</math> where <math>\mathbf{x}</math> contains the entries of <math>Q</math>.<br>
+Here,
+<math>\mathbf{x} = \begin{pmatrix}
+q_{11} \\ q_{12} \\ q_{13} \\
+q_{21} \\ q_{22} \\ q_{23} \\
+q_{31} \\ q_{32} \\ q_{33} \\
+\end{pmatrix}</math>
+and each row of <math>A</math> is
+<math>\mathbf{a_i} = \begin{pmatrix}
+u'_1 u_1 \\ u'_1 u_2 \\ u'_1 \\
+u'_2 u_1 \\ u'_2 u_2 \\ u'_2 \\
+u'_3 u_1 \\ u'_3 u_2 \\ 1 \\
+\end{pmatrix}
+</math><br>
+Here <math>A</math> is an <math>n \times 9</math> matrix (where <math>n=8</math> if using 8 points).
+The goal is to minimize <math>\Vert A\mathbf{x} \Vert </math> such that <math>\Vert \mathbf{x} \Vert = 1</math>
+;Solution
+* First take the SVD of A: <math>A = UDV^T</math>
+** <math>U</math> is <math>8 \times 8</math>, <math>D</math> is <math>8 \times 9</math> diagonal matrix, and <math>V^T</math> is a <math>9 \times 9</math> matrix.
+* Now <math>x = V_j</math>, the <math>j</math>-th column of <math>V</math>. Reshape this to get <math>Q_{est}</math>.
+* In practice, this may not be rank 2 so we take the another SVD <math>Q_{est}=U diag(r,s,t) V^T</math>
+* Zero out the third singular value to get a final estimate
+*: <math>Q' = U diag(r,s,0) V^T</math>
 ==Determining rotation <math>\mathbf{R}</math> and translation <math>\mathbf{t}</math>==
+Copied from section 3 or Hartley<ref name="hartley"/>.
 ;Theorem
@@ Line 84: / Line 123: @@
 * Here, <math>R</math> is your rotation and <math>S = [T]_{\times}</math>
 * <math>T = V (0, 0, 1)^T</math>, the third column of <math>V</math> or third row of <math>V^T</math>
+** Note that this only gives you the direction of the translation. The magnitude is not determined.
 * Some sources such as Wikipedia use <math>[T]_{\times} = U Z U^T</math> and <math>T = U (0, 0, 1)^T</math>.
 ** This is equivalent to <math>RT</math> in our notation.
-* Since <math>V</math> is orthonormal, this give <math>\Vert T \Vert = 1</math>
+* Since <math>V</math> is orthonormal, this gives <math>\Vert T \Vert = 1</math>
 * Note <math>-RT = -UEV^T V(0,0,1)^T = -U(0,0,1)^T</math>
@@ Line 101: / Line 141: @@
 For planar images, only one of these 4 options is feasible.
 You can determine which one is feasibly using triangulation with one of your points.
-In the implausible 3 possibilities, <math>P'\mathbf{u}</math> will be out of bounds or negative
 ==3D points==
 See [[Wikipedia: Essential_matrix]]
+==Fundamental Matrix==
+The fundamental matrix is a generalization of the essential matrix which also takes into account the calibration of the camera.
 ==Resources==
 * [[Wikipedia: Essential_matrix]]
+* [http://robotics.stanford.edu/~birch/projective/node20.html stanford essential and fundamental matricies]
+* [https://github.com/darknight1900/books/blob/master/Multiple%20View%20Geometry%20in%20Computer%20Vision%20(Second%20Edition).pdf Multiple View Geometry in Computer Vision by Hartley and Zisserman]
 ==References==