Visual Learning and Recognition: Difference between revisions

 
(4 intermediate revisions by the same user not shown)
Line 1,200: Line 1,200:
===Flow-based Models===
===Flow-based Models===
Flow-based models minimize the negative log-likelihood.
Flow-based models minimize the negative log-likelihood.
==Attribute-based Representation==
;Motivation
Typically in recognition, we only predict the class of the image. 
From the category, we can guess the attributes but the category provides only limited information. 
The network cannot perform prediction on unseen new classes. 
This problem used to be called ''graceful degradation''.
;Goal
Learn intermediate structure with object categories.
;Should we care about attributes in DL?
;Why is attributes not simply supervised recognition?
;Benefits
* Dealing with inevitable failure.
* We can infer things about unseen categories.
* We can make comparison between objects or categories.
;Datasets
* a-Pascal
* a-Yahoo
* CORE
* COCO Attributes
Deep networks should be able to learn attributes implicitly. 
However, you don't know if it has actually learned them.
==Extra Topics==
===Fine-grained Recognition===
===Few-shot Recognition===
* Metric learning methods
* Meta-learning methods
* Data Augmentation Methods
* Semantics
===Zero-shot Recognition===
Goal is train a classifier without having seen a single labeled example. 
The information comes from a knowledge graph e.g. from word embeddings.
===Beyond Labelled Datasets===
* Semi-supervised: We have both labelled and unlabeled training samples.
* Weakly-supervised: The labels are weak, noisy, and non-necessarily for the task we want.
* Learning from the Web: Download data from the internet


==Will be on the exam==
==Will be on the exam==
Line 1,237: Line 1,281:
* Challenges
* Challenges
* What methods worked and didn't work.
* What methods worked and didn't work.
==Misc==
[[Visible to::users]]


==References==
==References==
Line 1,245: Line 1,286:
<ref name="torralba2008tinyimages">Antonio Torralba, Rob Fergus and William T. Freeman (2008). 80 million tiny images: a large dataset for
<ref name="torralba2008tinyimages">Antonio Torralba, Rob Fergus and William T. Freeman (2008). 80 million tiny images: a large dataset for
non-parametric object and scene recognition (PAMI 2008) [https://people.csail.mit.edu/torralba/publications/80millionImages.pdf Link]</ref>
non-parametric object and scene recognition (PAMI 2008) [https://people.csail.mit.edu/torralba/publications/80millionImages.pdf Link]</ref>
<ref name="standing1973learning">Lionel Standing (1973). Learning 10000 pictures. ''Journal
<ref name="standing1973learning">Lionel Standing (1973). Learning 10000 pictures. ''Journal Quarterly Journal of Experimental Psychology'' [https://www.tandfonline.com/doi/abs/10.1080/14640747308400340 Link]</ref>
Quarterly Journal of Experimental Psychology'' [https://www.tandfonline.com/doi/abs/10.1080/14640747308400340 Link]</ref>
<ref name="brady2008visual">Timothy F. Brady, Talia Konkle, George A. Alvarez, and Aude Oliva (2008). Visual long-term memory has a massive storage capacity for object details. [http://olivalab.mit.edu/MM/pdfs/BradyKonkleAlvarezOliva2008.pdf Link].</ref>
<ref name="brady2008visual">Timothy F. Brady, Talia Konkle, George A. Alvarez, and Aude Oliva (2008). Visual long-term memory has a massive storage capacity for object details. [http://olivalab.mit.edu/MM/pdfs/BradyKonkleAlvarezOliva2008.pdf Link].</ref>
<ref name="torralba2011unbiased>Antonio Torralba, Alexei A. Efros (2011). Unbiased Look at Dataset Bias (CVPR 2011) [https://people.csail.mit.edu/torralba/publications/datasets_cvpr11.pdf Link]</ref>
<ref name="torralba2011unbiased>Antonio Torralba, Alexei A. Efros (2011). Unbiased Look at Dataset Bias (CVPR 2011) [https://people.csail.mit.edu/torralba/publications/datasets_cvpr11.pdf Link]</ref>