5,337
edits
Line 730: | Line 730: | ||
;How to evaluate this? | ;How to evaluate this? | ||
Object Keypoint Similarity (OKS) is a metric based on euclidean distance with a Gaussian error margin per keypoint based on the scale of each object. | Object Keypoint Similarity (OKS) is a metric based on euclidean distance with a Gaussian error margin per keypoint based on the scale of each object. | ||
===Approaches for Pose Estimation=== | |||
* Top-down approaches | |||
** Do person detection then do pose estimation. | |||
** Pros: Already have a person detector, smaller image to process in second stage. | |||
** Cons: Runtime proportional to number of people. | |||
* Bottom-up approaches | |||
** Detect limbs then do person | |||
** Pros: No dependency on person detector. Maybe single stage. Time independent of # of people. | |||
** Cons: Optimization for part associate is NP hard. | |||
===Examples=== | |||
Pose Machines + CPM + OpenPose | |||
2016 COCO Winner | |||
* Pose Machines ECCV 2014 | |||
* Convolutional Pose Machines CVPR 2016 | |||
* Realtime Multi-Person 2D | |||
G-RMI | |||
2016 COCO Runner-up | |||
===Pose Machines=== | |||
ECCV 2014 | |||
Bottom-up approach | |||
Address challenges: | |||
* Local evidence is weak | |||
* Part context is a strong cue | |||
* Larger composite parts can be easier to detect. | |||
===Convolutional Pose Machines=== | |||
===OpenPose=== | |||
For 30 people with 17 keypoints each, we have 1.3e5 pairwise connections. | |||
===G-RMI (Google)=== | |||
# Compute Heatmaps G | |||
# Offset Output | |||
# Fuse heatmaps and offsets via Hough Voting | |||
===Stacked Hourglass Networks=== | |||
2017 COCO Runner-up | |||
==Videos== | |||
* Video as space-time volume | |||
* Object correspondence via tracking | |||
* Motion for parallax, occlusions | |||
;Why are videos challenging? | |||
* Too much data. | |||
==Will be on the exam== | ==Will be on the exam== |