Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

1,060 bytes added , 29 October 2020

5,337

edits

@@ Line 791: / Line 791: @@
 ;Why are videos challenging?
-* Too much data.
+* Boundaries between tasks are poorly defined.
+* Huge computational cost.
+;Spectrum of Problems
+* Primitive actions
+* Actions
+* Events
+;Capturing long-range context
+* Long-range
+* Spatio-temporal
+* Camera motion
+* Cycles and speed
+===Tasks and Datasets===
+* Action Classification
+* Temporal Action Localization
+* Spatio-Temporal Action Detection
+; Actions datasets
+* KTH Human actions dataset
+* UCF Sport actions dataset
+** Biases: Hard to have negative actions (e.g. drum in scene without drumming)
+* Sports-1M
+** Has audio bias.
+* Kinetics-v2
+** ''ImageNet of videos''
+** Collected from YouTube
+** 600 actions, 500k clips
+* Moments in time
+** 800k 3 second clips
+** 339 classes.
+* SLAC: Sparsely Labeled Actions Dataset
+** 520K untrimmed videos
+* JHMDB
+* Something Something
+* Charades
+* AVA
+* EPIC Kitchen
+* HAA500
+* FineGYM
+;Self driving cars datasets
+* KITTI++
+* ArgoVerse
+* Open Waymo Dataset
+* Lyft Level 5 Open Data
+* Berkeley DeepDrive
+===Models for Video Recognition===
+In general:
+# Extract features
+# Learn space-time bag of words
+# Train/test BoW classifier
 ==Will be on the exam==