5,337
edits
(→Videos) |
|||
Line 791: | Line 791: | ||
;Why are videos challenging? | ;Why are videos challenging? | ||
* | * Boundaries between tasks are poorly defined. | ||
* Huge computational cost. | |||
;Spectrum of Problems | |||
* Primitive actions | |||
* Actions | |||
* Events | |||
;Capturing long-range context | |||
* Long-range | |||
* Spatio-temporal | |||
* Camera motion | |||
* Cycles and speed | |||
===Tasks and Datasets=== | |||
* Action Classification | |||
* Temporal Action Localization | |||
* Spatio-Temporal Action Detection | |||
; Actions datasets | |||
* KTH Human actions dataset | |||
* UCF Sport actions dataset | |||
** Biases: Hard to have negative actions (e.g. drum in scene without drumming) | |||
* Sports-1M | |||
** Has audio bias. | |||
* Kinetics-v2 | |||
** ''ImageNet of videos'' | |||
** Collected from YouTube | |||
** 600 actions, 500k clips | |||
* Moments in time | |||
** 800k 3 second clips | |||
** 339 classes. | |||
* SLAC: Sparsely Labeled Actions Dataset | |||
** 520K untrimmed videos | |||
* JHMDB | |||
* Something Something | |||
* Charades | |||
* AVA | |||
* EPIC Kitchen | |||
* HAA500 | |||
* FineGYM | |||
;Self driving cars datasets | |||
* KITTI++ | |||
* ArgoVerse | |||
* Open Waymo Dataset | |||
* Lyft Level 5 Open Data | |||
* Berkeley DeepDrive | |||
===Models for Video Recognition=== | |||
In general: | |||
# Extract features | |||
# Learn space-time bag of words | |||
# Train/test BoW classifier | |||
==Will be on the exam== | ==Will be on the exam== |