5,343
edits
(→Method) |
No edit summary Tag: visualeditor |
||
Line 1: | Line 1: | ||
Monocular Neural Image Based Rendering with Continuous View Control | Monocular Neural Image Based Rendering with Continuous View Control (ICCV 2019) | ||
Authors: Xu Chen, Jie Song, Otmar Hilliges | Authors: Xu Chen, Jie Song, Otmar Hilliges | ||
Affiliations: AIT Lab, ETH Zurich | Affiliations: AIT Lab, ETH Zurich | ||
* [https://arxiv.org/abs/1901.01880 Arxiv mirror] [http://openaccess.thecvf.com/content_ICCV_2019/html/Chen_Monocular_Neural_Image_Based_Rendering_With_Continuous_View_Control_ICCV_2019_paper.html CVF Mirror] [https://ieeexplore.ieee.org/document/9008541 IEEE Xplore] | *[https://arxiv.org/abs/1901.01880 Arxiv mirror] [http://openaccess.thecvf.com/content_ICCV_2019/html/Chen_Monocular_Neural_Image_Based_Rendering_With_Continuous_View_Control_ICCV_2019_paper.html CVF Mirror] [https://ieeexplore.ieee.org/document/9008541 IEEE Xplore] | ||
* [http://openaccess.thecvf.com/content_ICCV_2019/supplemental/Chen_Monocular_Neural_Image_ICCV_2019_supplemental.pdf Supp] | *[http://openaccess.thecvf.com/content_ICCV_2019/supplemental/Chen_Monocular_Neural_Image_ICCV_2019_supplemental.pdf Supp] | ||
* [https://github.com/xuchen-ethz/continuous_view_synthesis Github] | *[https://github.com/xuchen-ethz/continuous_view_synthesis Github] | ||
Line 14: | Line 14: | ||
The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image. | The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image. | ||
# Encode the image \(I_s\) into a latent representation \(z = E_{\theta_{e}}(I_s)\). | #<nowiki>Encode the image \(I_s\) into a latent representation \(z = E_{\theta_{e}}(I_s)\).</nowiki> | ||
# Rotate and translate the latent representation to get \(z_{T} = T_{s \to t}(z)\). | #Rotate and translate the latent representation to get \(z_{T} = T_{s \to t}(z)\). | ||
# Decode the latent representation into a depth map for the target view \(D_t\). | #Decode the latent representation into a depth map for the target view \(D_t\). | ||
# Compute correspondences between source and target using projection to the depth map | #Compute correspondences between source and target using projection to the depth map | ||
#* Uses camera intrinsics \(K\) and extrinsics \(T_{s\to t}\) to yield a dense backward flow map \(C_{t \to s}\). | #*Uses camera intrinsics \(K\) and extrinsics \(T_{s\to t}\) to yield a dense backward flow map \(C_{t \to s}\). | ||
# Do warping using correspondences to get the target image \(\hat{I}_{t}\). | #Do warping using correspondences to get the target image \(\hat{I}_{t}\). | ||
In total, the mapping is: | <nowiki>In total, the mapping is: | ||
\[ | \[ | ||
\begin{equation} | \begin{equation} | ||
M(I_s) = B(P_{t \to s}(D_{\theta_{d}}(T_{s \to t}(E_{\theta_{e}}(T_s))),I_s) = \hat{I}_{t} | M(I_s) = B(P_{t \to s}(D_{\theta_{d}}(T_{s \to t}(E_{\theta_{e}}(T_s))),I_s) = \hat{I}_{t} | ||
\end{equation} | \end{equation} | ||
\] | \]</nowiki> | ||
where: | where: | ||
* \(B(F, I)\) is a bilinear warp of image \(I\) using backwards flow \(F\) | |||
* \(P_{t \to s}(I)\)) is the projection of \(I\) from \(t\) to \(s\) | *\(B(F, I)\) is a bilinear warp of image \(I\) using backwards flow \(F\) | ||
*\(P_{t \to s}(I)\)) is the projection of \(I\) from \(t\) to \(s\) | |||
===Transforming Auto-encoder=== | ===Transforming Auto-encoder=== | ||
Line 47: | Line 48: | ||
The only neural network they use is a transforming autoencoder. | The only neural network they use is a transforming autoencoder. | ||
Details about their network are provided in the supplementary details as well as in the code. | Details about their network are provided in the supplementary details as well as in the code. | ||
Their implementation is based on Zhou et al. [https://github.com/tinghuiz/appearance-flow View Synthesis by Appearance Flow]<ref name="zhou2016view">Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alyosha Efros. (2016). View Synthesis by Appearance Flow (ECCV 2016) DOI: [https://doi.org/10.1007/978-3-319-46493-0_18 10.1007/978-3-319-46493-0_18] Arxiv Mirror | Their implementation is based on Zhou et al. [https://github.com/tinghuiz/appearance-flow View Synthesis by Appearance Flow]<ref name="zhou2016view">Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alyosha Efros. (2016). View Synthesis by Appearance Flow (ECCV 2016) DOI: [https://doi.org/10.1007/978-3-319-46493-0_18 10.1007/978-3-319-46493-0_18] Arxiv Mirror https://arxiv.org/abs/1605.03557</ref>. | ||
The encoder converts images into latent points. | The encoder converts images into latent points. | ||
Line 62: | Line 63: | ||
==References== | ==References== | ||
<references /> |