Visual Learning and Recognition: Difference between revisions
| Line 548: | Line 548: | ||
===Fast R-CNN=== | ===Fast R-CNN=== | ||
Makes the whole network trainable. | Makes the whole network trainable. | ||
* Pass the whole image through a CNN | |||
* For each RoI (suppose size <math>h \times w</math>), do RoI pooling to get an <math>H \times W</math> feature map. | |||
** This is a max-pooling over subwindows of size <math>h/H \times w/H</math>. | |||
* Pass the feature map into a FC + softmax classifier. | |||
* Pass the feature map in a bbox regressor. | |||
The entire network is trained together rather than in stages. | |||
The final loss function combines both tasks. | |||
'''Exam Question''' | '''Exam Question''' | ||