Visual Learning and Recognition

Notes for CMSC828I Visual Learning and Recognition (Fall 2020) taught by Abhinav Shrivastava

This class covers:

How a sub-topic evolved
State of the art

Introduction to Data

September 8, 2020

The extremes of data. If we have very few images, we are working on an extrapolation problem.
As we approach an infinite number of training samples, learning becomes an interpolation problem.
Traditional datasets are in the order of \(10^2-10^4\) training samples.
Current datasets are in the order of \(10^5-10^7\) training samples.

In tiny images ^[1], Torrabla et al. use 80 million tiny images.

What is the capacity of visual long term memory?

In Standing (1973)^[2], people could recall whether they've seen 10,000 images with 83% recognition.

What we don't know is what people are remembering for each item?

In Brady et al.^[3], they tested recall for novel (new object), exemplar (same type of object), and state (same object & state). They got 92% for novel, 88% for exemplar, and 87% for state so humans remember the exact state of objects they've seen.

Rule of thumb

(Simple algorithms + big data) is better than (complicated algorithms + small data)

Lecture 4 (September 10, 2020)

This lecture is on the bias of data. It follows Torralba et al.^[4]

Will big data solve all our problems?

E.g. Can (big company) just dump millions of dollars to collect data and solve any problem?
No. E.g. COVID.
There will always be new tasks or problems.

We will never have enough data

Long tails - Zipf's law

Data is biased

Types of visual bias:

Observer Bias (human vs bird)
Capture Bias (photographer vs robot)
Selection Bias (Flickr vs Google Street View)
Category/Label Bias
Negative Set Bias

In general, all datasets will have all of these biases mixed in.

Social Bias

Graduation photos always have a certain structure.

Measuring Dataset Bias

Evaluate cross-dataset performance

Data-driven Methods in Vision

Will be on the exam

Back-prop and SGD,
Softmax, sigmoid, cross entropy

Misc

Visible to::users

References

↑ Antonio Torralba, Rob Fergus and William T. Freeman (2008). 80 million tiny images: a large dataset for non-parametric object and scene recognition (PAMI 2008) https://people.csail.mit.edu/torralba/publications/80millionImages.pdf
↑ Lionel Standing (1973). Learning 10000 pictures. Journal Quarterly Journal of Experimental Psychology https://www.tandfonline.com/doi/abs/10.1080/14640747308400340
↑ Timothy F. Brady, Talia Konkle, George A. Alvarez, and Aude Oliva (2008). Visual long-term memory has a massive storage capacity for object details. http://olivalab.mit.edu/MM/pdfs/BradyKonkleAlvarezOliva2008.pdf.
↑ Antonio Torralba, Alexei A. Efros (2011). Unbiased Look at Dataset Bias (CVPR 2011) https://people.csail.mit.edu/torralba/publications/datasets_cvpr11.pdf

[torralba2008tinyimages-1] Antonio Torralba, Rob Fergus and William T. Freeman (2008). 80 million tiny images: a large dataset for non-parametric object and scene recognition (PAMI 2008) https://people.csail.mit.edu/torralba/publications/80millionImages.pdf

[standing1973learning-2] Lionel Standing (1973). Learning 10000 pictures. Journal Quarterly Journal of Experimental Psychology https://www.tandfonline.com/doi/abs/10.1080/14640747308400340

[brady2008visual-3] Timothy F. Brady, Talia Konkle, George A. Alvarez, and Aude Oliva (2008). Visual long-term memory has a massive storage capacity for object details. http://olivalab.mit.edu/MM/pdfs/BradyKonkleAlvarezOliva2008.pdf.

[torralba2011unbiased-4] Antonio Torralba, Alexei A. Efros (2011). Unbiased Look at Dataset Bias (CVPR 2011) https://people.csail.mit.edu/torralba/publications/datasets_cvpr11.pdf

[1]

[2]

[3]

[4]