Visual Learning and Recognition: Difference between revisions

Line 131: Line 131:


Xie ''et al.''<ref name="xie2018rethinking"></ref> develop S3D-G for video classification. The main idea is that video classification can be done with Conv2d layers at lower layers and Conv3d layers at higher layers. In addition, the time and spatial dimensions can be separated into two different 3D convolutions (with <math>1 \times k \times k</math> and <math>k_t \times 1 \times 1</math> kernels). These two changes improve the accuracy and efficiency of video classification compared to just Conv3d.
Xie ''et al.''<ref name="xie2018rethinking"></ref> develop S3D-G for video classification. The main idea is that video classification can be done with Conv2d layers at lower layers and Conv3d layers at higher layers. In addition, the time and spatial dimensions can be separated into two different 3D convolutions (with <math>1 \times k \times k</math> and <math>k_t \times 1 \times 1</math> kernels). These two changes improve the accuracy and efficiency of video classification compared to just Conv3d.
===Overview===
ConvNet pipeline:
* Input
* Conv/ReLU/Pool
* FC/ReLu
* FC/Normalization/Loss


==Will be on the exam==
==Will be on the exam==