Visual Learning and Recognition: Difference between revisions
| (11 intermediate revisions by the same user not shown) | |||
| Line 377: | Line 377: | ||
* 4 Texture Related | * 4 Texture Related | ||
=== | ===Deformable Part Models (DPM)=== | ||
Lecture (Oct 8, 2020) | Lecture (Oct 6-8, 2020) | ||
Deformable Part Models (DPM) | |||
Felzenszwalb et al (2009) <ref name="felzenszwalb2009dpm"></ref> | Felzenszwalb et al (2009) <ref name="felzenszwalb2009dpm"></ref> | ||
| Line 750: | Line 751: | ||
===Approaches for Pose Estimation=== | ===Approaches for Pose Estimation=== | ||
Lecture Oct 27 | |||
* Top-down approaches | * Top-down approaches | ||
** Do person detection then do pose estimation. | ** Do person detection then do pose estimation. | ||
| Line 997: | Line 1,000: | ||
Primitives | Primitives | ||
* Depth | * Depth - not normalized making them hard to use, have discontinuities, do not represent objects | ||
* Surface normals | * Surface normals - are gradient of depth | ||
===Scene Intrinsics=== | ===Scene Intrinsics=== | ||
| Line 1,157: | Line 1,160: | ||
\implies &\log P(x) \geq E_{z \sim Q(z|x)}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)] | \implies &\log P(x) \geq E_{z \sim Q(z|x)}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)] | ||
\end{aligned} | \end{aligned} | ||
</math> | </math> | ||
This is known as variational lower bound or ''ELBO''. | This is known as variational lower bound or ''ELBO''. | ||
| Line 1,197: | Line 1,200: | ||
===Flow-based Models=== | ===Flow-based Models=== | ||
Flow-based models minimize the negative log-likelihood. | Flow-based models minimize the negative log-likelihood. | ||
==Attribute-based Representation== | |||
;Motivation | |||
Typically in recognition, we only predict the class of the image. | |||
From the category, we can guess the attributes but the category provides only limited information. | |||
The network cannot perform prediction on unseen new classes. | |||
This problem used to be called ''graceful degradation''. | |||
;Goal | |||
Learn intermediate structure with object categories. | |||
;Should we care about attributes in DL? | |||
;Why is attributes not simply supervised recognition? | |||
;Benefits | |||
* Dealing with inevitable failure. | |||
* We can infer things about unseen categories. | |||
* We can make comparison between objects or categories. | |||
;Datasets | |||
* a-Pascal | |||
* a-Yahoo | |||
* CORE | |||
* COCO Attributes | |||
Deep networks should be able to learn attributes implicitly. | |||
However, you don't know if it has actually learned them. | |||
==Extra Topics== | |||
===Fine-grained Recognition=== | |||
===Few-shot Recognition=== | |||
* Metric learning methods | |||
* Meta-learning methods | |||
* Data Augmentation Methods | |||
* Semantics | |||
===Zero-shot Recognition=== | |||
Goal is train a classifier without having seen a single labeled example. | |||
The information comes from a knowledge graph e.g. from word embeddings. | |||
===Beyond Labelled Datasets=== | |||
* Semi-supervised: We have both labelled and unlabeled training samples. | |||
* Weakly-supervised: The labels are weak, noisy, and non-necessarily for the task we want. | |||
* Learning from the Web: Download data from the internet | |||
==Will be on the exam== | ==Will be on the exam== | ||
| Line 1,204: | Line 1,251: | ||
* DPM | * DPM | ||
* Selective search vs RPM | * Selective search vs RPM | ||
* ELBO | |||
Final exam: | Final exam: | ||
| Line 1,226: | Line 1,274: | ||
* Recorded videos with presentations | * Recorded videos with presentations | ||
* Final reports Dec 18 | * Final reports Dec 18 | ||
[https://docs.google.com/document/d/1BKmpBWBWuEEywDyBw9CsHgOPB6DH7I0oKEs8zXS7XQw/edit?usp=sharing My Exam Cheat Sheet] | |||
==Project Notes== | ==Project Notes== | ||
| Line 1,231: | Line 1,281: | ||
* Challenges | * Challenges | ||
* What methods worked and didn't work. | * What methods worked and didn't work. | ||
==References== | ==References== | ||
| Line 1,239: | Line 1,286: | ||
<ref name="torralba2008tinyimages">Antonio Torralba, Rob Fergus and William T. Freeman (2008). 80 million tiny images: a large dataset for | <ref name="torralba2008tinyimages">Antonio Torralba, Rob Fergus and William T. Freeman (2008). 80 million tiny images: a large dataset for | ||
non-parametric object and scene recognition (PAMI 2008) [https://people.csail.mit.edu/torralba/publications/80millionImages.pdf Link]</ref> | non-parametric object and scene recognition (PAMI 2008) [https://people.csail.mit.edu/torralba/publications/80millionImages.pdf Link]</ref> | ||
<ref name="standing1973learning">Lionel Standing (1973). Learning 10000 pictures. ''Journal | <ref name="standing1973learning">Lionel Standing (1973). Learning 10000 pictures. ''Journal Quarterly Journal of Experimental Psychology'' [https://www.tandfonline.com/doi/abs/10.1080/14640747308400340 Link]</ref> | ||
Quarterly Journal of Experimental Psychology'' [https://www.tandfonline.com/doi/abs/10.1080/14640747308400340 Link]</ref> | |||
<ref name="brady2008visual">Timothy F. Brady, Talia Konkle, George A. Alvarez, and Aude Oliva (2008). Visual long-term memory has a massive storage capacity for object details. [http://olivalab.mit.edu/MM/pdfs/BradyKonkleAlvarezOliva2008.pdf Link].</ref> | <ref name="brady2008visual">Timothy F. Brady, Talia Konkle, George A. Alvarez, and Aude Oliva (2008). Visual long-term memory has a massive storage capacity for object details. [http://olivalab.mit.edu/MM/pdfs/BradyKonkleAlvarezOliva2008.pdf Link].</ref> | ||
<ref name="torralba2011unbiased>Antonio Torralba, Alexei A. Efros (2011). Unbiased Look at Dataset Bias (CVPR 2011) [https://people.csail.mit.edu/torralba/publications/datasets_cvpr11.pdf Link]</ref> | <ref name="torralba2011unbiased>Antonio Torralba, Alexei A. Efros (2011). Unbiased Look at Dataset Bias (CVPR 2011) [https://people.csail.mit.edu/torralba/publications/datasets_cvpr11.pdf Link]</ref> | ||