Super SloMo: Difference between revisions

(One intermediate revision by the same user not shown)

Line 1:

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

~~Authors: Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan Kautz~~

~~UMass Amherst, NVIDIA, UC Merced~~

* [http://openaccess.thecvf.com/content_cvpr_2018/html/Jiang_Super_SloMo_High_CVPR_2018_paper.html CVPR 2018 Paper][https://arxiv.org/abs/1712.00080 Arxiv Mirror]

Authors: Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan Kautz

* [https://github.com/avinashpaliwal/Super-SloMo Unofficial implementation by Avinash Paliwal]

Affiliations: UMass Amherst, NVIDIA, UC Merced

*[http://openaccess.thecvf.com/content_cvpr_2018/html/Jiang_Super_SloMo_High_CVPR_2018_paper.html CVPR 2018 Paper][https://arxiv.org/abs/1712.00080 Arxiv Mirror]

*[https://github.com/avinashpaliwal/Super-SloMo Unofficial implementation by Avinash Paliwal]

==Method==

Line 13:

Line 15:

First, estimate the optical flow \(F_{0\to 1}\) and \(F_{1 \to 0}\). This is done using an optical flow neural network.

Then given these two, we can estimate the optical flow from the intermediate frame as follows:

* <math>\hat{F}_{t\to 0} = -(1-t)t F_{0 \to 1} + t^2 F_{1 \to 0}</math>

* <math>\hat{F}_{t \to 1} = (1-t)^2 F_{0 \to 1} - t(1-t)F_{1 \to 0}</math>

*<math>\hat{F}_{t\to 0} = -(1-t)t F_{0 \to 1} + t^2 F_{1 \to 0}</math>

*<math>\hat{F}_{t \to 1} = (1-t)^2 F_{0 \to 1} - t(1-t)F_{1 \to 0}</math>

{{hidden | Derivation |

We consider estimating <math display="inline">F_{t \to 1}(p)</math>.

Line 27:

Line 30:

The estimate of the intermediate frame now is:

* <math>\hat{I}_t = \alpha_0 \odot g(I_0, F_{t \to 0}) + (1 - \alpha_0) \odot g(I_1, F_{t \to 1})</math>

*<math>\hat{I}_t = \alpha_0 \odot g(I_0, F_{t \to 0}) + (1 - \alpha_0) \odot g(I_1, F_{t \to 1})</math>

where \(g\) is a differentiable backward warping function (bilinear interpolation) and \(\alpha_0\) controls the pixelwise contribution from each image.

A naive estimate would use \(\alpha_0 = (1-t)\). However, to address occlusions, it is necessary to find the visibility maps.

Line 36:

Line 41:

The final images estimate is:

* <math>\hat{I}_t = \frac{1}{Z} \odot \left( (1-t)V_{t \leftarrow 1} \odot g(I_0, F_{t \to 0}) + tV_{t \leftarrow 1} \odot g(I_1, F_{t \to 1}) \right)</math>

*<math>\hat{I}_t = \frac{1}{Z} \odot \left( (1-t)V_{t \leftarrow 1} \odot g(I_0, F_{t \to 0}) + tV_{t \leftarrow 1} \odot g(I_1, F_{t \to 1}) \right)</math>

==Architecture==

Their architecture consists of two similar CNNS, a flow computation CNN to compute the bidirectional flow between the two images and a flow interpolation CNN.

Both networks are fully-convolution U-Net with 6 hierarchies in the encoder and 5 hierarchies in the decoder.

===Flow Computation CNN===

===Flow Interpolation CNN===

==Resources==

==References==

@@ Line 1: / Line 1: @@
 Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
-Authors: Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan Kautz
-UMass Amherst, NVIDIA, UC Merced
-* [http://openaccess.thecvf.com/content_cvpr_2018/html/Jiang_Super_SloMo_High_CVPR_2018_paper.html CVPR 2018 Paper][https://arxiv.org/abs/1712.00080 Arxiv Mirror]
+Authors: Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan Kautz
-* [https://github.com/avinashpaliwal/Super-SloMo Unofficial implementation by Avinash Paliwal]
+Affiliations: UMass Amherst, NVIDIA, UC Merced
+*[http://openaccess.thecvf.com/content_cvpr_2018/html/Jiang_Super_SloMo_High_CVPR_2018_paper.html CVPR 2018 Paper][https://arxiv.org/abs/1712.00080 Arxiv Mirror]
+*[https://github.com/avinashpaliwal/Super-SloMo Unofficial implementation by Avinash Paliwal]
 ==Method==
@@ Line 13: / Line 15: @@
 First, estimate the optical flow \(F_{0\to 1}\) and \(F_{1 \to 0}\). This is done using an optical flow neural network.
 Then given these two, we can estimate the optical flow from the intermediate frame as follows:
-* <math>\hat{F}_{t\to 0} = -(1-t)t F_{0 \to 1} + t^2 F_{1 \to 0}</math>
-* <math>\hat{F}_{t \to 1} = (1-t)^2 F_{0 \to 1} - t(1-t)F_{1 \to 0}</math>
+*<math>\hat{F}_{t\to 0} = -(1-t)t F_{0 \to 1} + t^2 F_{1 \to 0}</math>
+*<math>\hat{F}_{t \to 1} = (1-t)^2 F_{0 \to 1} - t(1-t)F_{1 \to 0}</math>
 {{hidden | Derivation |
 We consider estimating <math display="inline">F_{t \to 1}(p)</math>.
@@ Line 27: / Line 30: @@
 The estimate of the intermediate frame now is:
-* <math>\hat{I}_t = \alpha_0 \odot g(I_0, F_{t \to 0}) + (1 - \alpha_0) \odot g(I_1, F_{t \to 1})</math>
+*<math>\hat{I}_t = \alpha_0 \odot g(I_0, F_{t \to 0}) + (1 - \alpha_0) \odot g(I_1, F_{t \to 1})</math>
 where \(g\) is a differentiable backward warping function (bilinear interpolation) and \(\alpha_0\) controls the pixelwise contribution from each image.
 A naive estimate would use \(\alpha_0 = (1-t)\). However, to address occlusions, it is necessary to find the visibility maps.
@@ Line 36: / Line 41: @@
 The final images estimate is:
-* <math>\hat{I}_t = \frac{1}{Z} \odot \left( (1-t)V_{t \leftarrow 1} \odot g(I_0, F_{t \to 0}) + tV_{t \leftarrow 1} \odot g(I_1, F_{t \to 1}) \right)</math>
+*<math>\hat{I}_t = \frac{1}{Z} \odot \left( (1-t)V_{t \leftarrow 1} \odot g(I_0, F_{t \to 0}) + tV_{t \leftarrow 1} \odot g(I_1, F_{t \to 1}) \right)</math>
 ==Architecture==
+Their architecture consists of two similar CNNS, a flow computation CNN to compute the bidirectional flow between the two images and a flow interpolation CNN.
+Both networks are fully-convolution U-Net with 6 hierarchies in the encoder and 5 hierarchies in the decoder.
+===Flow Computation CNN===
+===Flow Interpolation CNN===
+==Resources==
+==References==