Monocular Neural Image Based Rendering with Continuous View Control: Difference between revisions

no edit summary
No edit summary
Tag: visualeditor
Line 1: Line 1:


Monocular Neural Image Based Rendering with Continuous View Control   
Monocular Neural Image Based Rendering with Continuous View Control (ICCV 2019)  


Authors: Xu Chen, Jie Song, Otmar Hilliges   
Authors: Xu Chen, Jie Song, Otmar Hilliges   
Affiliations: AIT Lab, ETH Zurich
Affiliations: AIT Lab, ETH Zurich


* [https://arxiv.org/abs/1901.01880 Arxiv mirror] [http://openaccess.thecvf.com/content_ICCV_2019/html/Chen_Monocular_Neural_Image_Based_Rendering_With_Continuous_View_Control_ICCV_2019_paper.html CVF Mirror] [https://ieeexplore.ieee.org/document/9008541 IEEE Xplore]
*[https://arxiv.org/abs/1901.01880 Arxiv mirror] [http://openaccess.thecvf.com/content_ICCV_2019/html/Chen_Monocular_Neural_Image_Based_Rendering_With_Continuous_View_Control_ICCV_2019_paper.html CVF Mirror] [https://ieeexplore.ieee.org/document/9008541 IEEE Xplore]
* [http://openaccess.thecvf.com/content_ICCV_2019/supplemental/Chen_Monocular_Neural_Image_ICCV_2019_supplemental.pdf Supp]
*[http://openaccess.thecvf.com/content_ICCV_2019/supplemental/Chen_Monocular_Neural_Image_ICCV_2019_supplemental.pdf Supp]
* [https://github.com/xuchen-ethz/continuous_view_synthesis Github]
*[https://github.com/xuchen-ethz/continuous_view_synthesis Github]




Line 14: Line 14:
The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image.
The goal of the transforming autoencoder is to create a point cloud of latent features from a 2D source image.


# Encode the image \(I_s\) into a latent representation \(z = E_{\theta_{e}}(I_s)\).
#<nowiki>Encode the image \(I_s\) into a latent representation \(z = E_{\theta_{e}}(I_s)\).</nowiki>
# Rotate and translate the latent representation to get \(z_{T} = T_{s \to t}(z)\).
#Rotate and translate the latent representation to get \(z_{T} = T_{s \to t}(z)\).
# Decode the latent representation into a depth map for the target view \(D_t\).
#Decode the latent representation into a depth map for the target view \(D_t\).
# Compute correspondences between source and target using projection to the depth map
#Compute correspondences between source and target using projection to the depth map
#* Uses camera intrinsics \(K\) and extrinsics \(T_{s\to t}\) to yield a dense backward flow map \(C_{t \to s}\).
#*Uses camera intrinsics \(K\) and extrinsics \(T_{s\to t}\) to yield a dense backward flow map \(C_{t \to s}\).
# Do warping using correspondences to get the target image \(\hat{I}_{t}\).
#Do warping using correspondences to get the target image \(\hat{I}_{t}\).


In total, the mapping is:
<nowiki>In total, the mapping is:
\[
\[
\begin{equation}
\begin{equation}
M(I_s) = B(P_{t \to s}(D_{\theta_{d}}(T_{s \to t}(E_{\theta_{e}}(T_s))),I_s) = \hat{I}_{t}
M(I_s) = B(P_{t \to s}(D_{\theta_{d}}(T_{s \to t}(E_{\theta_{e}}(T_s))),I_s) = \hat{I}_{t}
\end{equation}
\end{equation}
\]
\]</nowiki>


where:
where:
* \(B(F, I)\) is a bilinear warp of image \(I\) using backwards flow \(F\)
 
* \(P_{t \to s}(I)\)) is the projection of \(I\) from \(t\) to \(s\)
*\(B(F, I)\) is a bilinear warp of image \(I\) using backwards flow \(F\)
*\(P_{t \to s}(I)\)) is the projection of \(I\) from \(t\) to \(s\)


===Transforming Auto-encoder===
===Transforming Auto-encoder===
Line 47: Line 48:
The only neural network they use is a transforming autoencoder.   
The only neural network they use is a transforming autoencoder.   
Details about their network are provided in the supplementary details as well as in the code.   
Details about their network are provided in the supplementary details as well as in the code.   
Their implementation is based on Zhou et al. [https://github.com/tinghuiz/appearance-flow View Synthesis by Appearance Flow]<ref name="zhou2016view">Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alyosha Efros. (2016). View Synthesis by Appearance Flow (ECCV 2016) DOI: [https://doi.org/10.1007/978-3-319-46493-0_18 10.1007/978-3-319-46493-0_18] Arxiv Mirror [https://arxiv.org/abs/1605.03557 https://arxiv.org/abs/1605.03557]</ref>.
Their implementation is based on Zhou et al. [https://github.com/tinghuiz/appearance-flow View Synthesis by Appearance Flow]<ref name="zhou2016view">Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alyosha Efros. (2016). View Synthesis by Appearance Flow (ECCV 2016) DOI: [https://doi.org/10.1007/978-3-319-46493-0_18 10.1007/978-3-319-46493-0_18] Arxiv Mirror https://arxiv.org/abs/1605.03557</ref>.


The encoder converts images into latent points.   
The encoder converts images into latent points.   
Line 62: Line 63:


==References==
==References==
<references />