Visual Learning and Recognition: Difference between revisions

Line 548: Line 548:
===Fast R-CNN===
===Fast R-CNN===
Makes the whole network trainable.
Makes the whole network trainable.
* Pass the whole image through a CNN
* For each RoI (suppose size <math>h \times w</math>), do RoI pooling to get an <math>H \times W</math> feature map.
** This is a max-pooling over subwindows of size <math>h/H \times w/H</math>.
* Pass the feature map into a FC + softmax classifier.
* Pass the feature map in a bbox regressor.
The entire network is trained together rather than in stages. 
The final loss function combines both tasks.


'''Exam Question'''
'''Exam Question'''