Visual Learning and Recognition: Difference between revisions

Line 791: Line 791:


;Why are videos challenging?
;Why are videos challenging?
* Too much data.
* Boundaries between tasks are poorly defined.
* Huge computational cost.
 
;Spectrum of Problems
* Primitive actions
* Actions
* Events
 
;Capturing long-range context
* Long-range
* Spatio-temporal
* Camera motion
* Cycles and speed
 
===Tasks and Datasets===
* Action Classification
* Temporal Action Localization
* Spatio-Temporal Action Detection
 
; Actions datasets
* KTH Human actions dataset
* UCF Sport actions dataset
** Biases: Hard to have negative actions (e.g. drum in scene without drumming)
* Sports-1M
** Has audio bias.
* Kinetics-v2
** ''ImageNet of videos''
** Collected from YouTube
** 600 actions, 500k clips
* Moments in time
** 800k 3 second clips
** 339 classes.
* SLAC: Sparsely Labeled Actions Dataset
** 520K untrimmed videos
* JHMDB
* Something Something
* Charades
* AVA
* EPIC Kitchen
* HAA500
* FineGYM
 
;Self driving cars datasets
* KITTI++
* ArgoVerse
* Open Waymo Dataset
* Lyft Level 5 Open Data
* Berkeley DeepDrive
 
===Models for Video Recognition===
In general:
# Extract features
# Learn space-time bag of words
# Train/test BoW classifier


==Will be on the exam==
==Will be on the exam==