Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

Revision as of 17:44, 17 September 2020

1,729 bytes added , 17 September 2020

→‎Dealing with Sparse Data

David

Bureaucrats, Interface administrators, Administrators

5,321

edits

@@ Line 81: / Line 81: @@
 ===Dealing with Sparse Data===
-;Better Similarity
+*Better Similarity
-;Better Alignment
+====Better Alignment====
+**E.g. reduce resolution, sifting, warping
+;SIFT-Flow
+Take sift features for all regions.
+Then learn some SIFT vector to RGB color matching.
+The RGB images are called ''sift flow'' features.
+Similar RGB regions will have similar SIFT feature vectors.
+Then we can learn some transformation <math>T</math> to match the sift flows (i.e. <math>T(F_1) \approx F_2</math>).
+;Non-parametric Scene Parsing (CVPR 2009)
+If you have a good scene alignment algorithm, you can just use a segmentation map.
+====Use sub-images (primitives) to match====
+Allows matching from multiple images
+;Mid-level primitives
+Bag of visual words:
+# Take some features (e.g. SIFT) from every image in your dataset.
+# Apply clustering to your dataset to get k clusters. These k clusters are your visual words.
+The challenge with matching patches is how to find patches to match?
+Ideally, we want patches which are both representative and discriminate.
+Representative is that the patch is found in the target image set; i.e. coverage of the target concept.
+Discriminative is that the patch is not found in non-target image sets (distinct from other concepts).
+====Understanding simple stuff first====
+E.g. from a video, find one frame which is easy to detect pose and then apply optical-flow methods to transfer the flow to adjacent frames.
+====Looking beyond the k-NN method====
+Use data to make connections.
+;Visual Memex Knowledge Graph
+(Malisiewicz and Efros 2009)
+Build a visual knowledge graph of entites. Edges can be context edges or similarity edges.
+Embed an image into the graph and copy information from the graph.
+;Manifolds in Vision
+These days, we can assume deep learning features are reasonable manifolds.
 ==ConvNets and Architectures==