Monocular Neural Image Based Rendering with Continuous View Control: Difference between revisions

Revision as of 12:49, 1 June 2020

Monocular Neural Image Based Rendering with Continuous View Control

Authors: Xu Chen, Jie Song, Otmar Hilliges
Affiliations: AIT Lab, ETH Zurich

Method

The main idea is to create a transformating autoencoder.
The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image.

Encode the image \(I_s\) into a latent representation \(z = E_{\theta_{e}}(I_s)\).
Rotate and translate the latent representation to get \(z_{T} = T_{s \to t}(z)\).
Decode the latent representation into a depth map for the target view \(D_t\).
Compute correspondences between source and target using projection to the depth map
- Uses camera intrinsics \(K\) and extrinsics \(T_{s\to t}\) to yield a dense backward flow map \(C_{t \to s}\).
Do warping using correspondences to get the target image \(\hat{I}_{t}\).

In total, the mapping is: \[ \begin{equation} M(I_s) = B(P_{t \to s}(D_{\theta_{d}}(T_{s \to t}(E_{\theta_{e}}(T_s))),I_s) = \hat{I}_{t} \end{equation} \]

where:

\(B(F, I)\) is a bilinear warp of image \(I\) using backwards flow \(F\)
\(P_{t \to s}(I)\)) is the projection of \(I\) from \(t\) to \(s\)

@@ Line 13: / Line 13: @@
 The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image.
-# Encode the image into a latent representation
+# Encode the image \(I_s\) into a latent representation \(z = E_{\theta_{e}}(I_s)\).
-# Rotate and translate the latent representation
+# Rotate and translate the latent representation to get \(z_{T} = T_{s \to t}(z)\).
-# Decode the latent representation into a depth map for the target view
+# Decode the latent representation into a depth map for the target view \(D_t\).
 # Compute correspondences between source and target using projection to the depth map
-# Do warping using correspondences to get the target image
+#* Uses camera intrinsics \(K\) and extrinsics \(T_{s\to t}\) to yield a dense backward flow map \(C_{t \to s}\).
+# Do warping using correspondences to get the target image \(\hat{I}_{t}\).
+In total, the mapping is:
+\[
+\begin{equation}
+M(I_s) = B(P_{t \to s}(D_{\theta_{d}}(T_{s \to t}(E_{\theta_{e}}(T_s))),I_s) = \hat{I}_{t}
+\end{equation}
+\]
+where:
+* \(B(F, I)\) is a bilinear warp of image \(I\) using backwards flow \(F\)
+* \(P_{t \to s}(I)\)) is the projection of \(I\) from \(t\) to \(s\)
 ==Architecture==
+==References==