Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

Revision as of 18:37, 5 November 2020

2,029 bytes added , 5 November 2020

→‎Models for Video Recognition

David

Bureaucrats, Interface administrators, Administrators

5,337

edits

@@ Line 841: / Line 841: @@
 ===Models for Video Recognition===
-In general:
+;Basic Video Pipeline
 # Extract features
 # Learn space-time bag of words
 # Train/test BoW classifier
+;Spatio-temporal Feature Detectors
+* Harris3D
+* Cuboid
+* Hessian
+* Dense
+;Spatio-temporal Feature Descriptors
+* HOG/HOF
+* Cuboid
+* HOG3D
+* ExtendedSURF
+;Add trajectories
+# Track a keypoint's movement over time
+# Make a ''feature tube'' around the trajectory. Existing methods used a cube instead of a tube.
+# Then do whatever pooling you want (e.g. HOG) to get a trajectory description.
+* Two-stream ConvNet uses a spatial stream (RGB) and temporal stream (optical flow)
+* Add/Stack Trajectories: Flow should be added on top of original points.
+* Pool Along Trajectories instead of cubes
+* I3D stacks 8 frames, passes to 3D convnet and gets an output
+* Late fusion extracts features per-frame and then combines later.
+===What is an action?===
+ActionVLAD
+* BoW for actions
+* Actions are made up of subactions
+** E.g. basketball shoot = dribbling + jump + throw + running + ball
+Gaussian Temporal Awareness Networks
+* Key idea: Not all actions have the same temporal support.
+** Depending on frame-rate & action speed, actions can take a variable number of frames.
+Compressed Video Action Recognition
+* Idea is to present P-frames directly to the CNN which are essentially optical-flow.
+R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
+* Inspired by Faster R-CNN Architecture
+SST: Single-Stream Temporal Action Proposals
+* Single-shot proposal network
+Action Tubelet Detector for Spatio-Temporal Action Localization
+* For every frame, regress how to move the tubelet up or down
+Tube Convolutional Neural Network (T-CNN)
+===Complementary Approaches===
+Pose-based Action Recognition
+* Convert a video into pose maps and do classifications on poses
+PoTion: Pose MoTion Representation
+* Do pose estimation to get joint heatmaps.
+* Represent pose position & movement as  an image with red=start and green=end.
+* Stack this across time.
+PA3D: Pose-Action 3D Machine
+* Focus on pose to do action recognition
+VideoGraph: Recognizing Minutes-Long Activites
+==Recognition==
+===Context===
 ==Will be on the exam==