Monocular Neural Image Based Rendering with Continuous View Control: Difference between revisions
(Created page with " Monocular Neural Image Based Rendering with Continuous View Control Authors: Xu Chen, Jie Song, Otmar Hilliges Affiliations: AIT Lab, ETH Zurich * [https://arxiv.org/ab...") |
No edit summary |
||
Line 13: | Line 13: | ||
The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image. | The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image. | ||
# Encode the image into a latent representation | # Encode the image \(I_s\) into a latent representation \(z = E_{\theta_{e}}(I_s)\). | ||
# Rotate and translate the latent representation | # Rotate and translate the latent representation to get \(z_{T} = T_{s \to t}(z)\). | ||
# Decode the latent representation into a depth map for the target view | # Decode the latent representation into a depth map for the target view \(D_t\). | ||
# Compute correspondences between source and target using projection to the depth map | # Compute correspondences between source and target using projection to the depth map | ||
# Do warping using correspondences to get the target image | #* Uses camera intrinsics \(K\) and extrinsics \(T_{s\to t}\) to yield a dense backward flow map \(C_{t \to s}\). | ||
# Do warping using correspondences to get the target image \(\hat{I}_{t}\). | |||
In total, the mapping is: | |||
\[ | |||
\begin{equation} | |||
M(I_s) = B(P_{t \to s}(D_{\theta_{d}}(T_{s \to t}(E_{\theta_{e}}(T_s))),I_s) = \hat{I}_{t} | |||
\end{equation} | |||
\] | |||
where: | |||
* \(B(F, I)\) is a bilinear warp of image \(I\) using backwards flow \(F\) | |||
* \(P_{t \to s}(I)\)) is the projection of \(I\) from \(t\) to \(s\) | |||
==Architecture== | ==Architecture== | ||
==References== |
Revision as of 12:49, 1 June 2020
Monocular Neural Image Based Rendering with Continuous View Control
Authors: Xu Chen, Jie Song, Otmar Hilliges
Affiliations: AIT Lab, ETH Zurich
Method
The main idea is to create a transformating autoencoder.
The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image.
- Encode the image \(I_s\) into a latent representation \(z = E_{\theta_{e}}(I_s)\).
- Rotate and translate the latent representation to get \(z_{T} = T_{s \to t}(z)\).
- Decode the latent representation into a depth map for the target view \(D_t\).
- Compute correspondences between source and target using projection to the depth map
- Uses camera intrinsics \(K\) and extrinsics \(T_{s\to t}\) to yield a dense backward flow map \(C_{t \to s}\).
- Do warping using correspondences to get the target image \(\hat{I}_{t}\).
In total, the mapping is: \[ \begin{equation} M(I_s) = B(P_{t \to s}(D_{\theta_{d}}(T_{s \to t}(E_{\theta_{e}}(T_s))),I_s) = \hat{I}_{t} \end{equation} \]
where:
- \(B(F, I)\) is a bilinear warp of image \(I\) using backwards flow \(F\)
- \(P_{t \to s}(I)\)) is the projection of \(I\) from \(t\) to \(s\)