Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

TitleDeformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Publication TypeConference Paper
Year of Publication2013
AuthorsZhang, N., Farrell R., Iandola F., & Darrell T.
Other Numbers3619

Recognizing objects in fine-grained domains can beextremely challenging due to the subtle differences between subcategories. Discriminative markings are oftenhighly localized, leading traditional object recognition approaches to struggle with the large pose variation oftenpresent in these domains. Pose-normalization seeks to aligntraining exemplars, either piecewise by part or globallyfor the whole object, effectively factoring out differencesin pose and in viewing angle. Prior approaches reliedon computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based oncomputationally-efficient deformable part models. Thefirst leverages the semantics inherent in strongly-supervisedDPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts ofa weakly-supervised DPM. These representations enablepooling across pose and viewpoint, in turn facilitating taskssuch as fine-grained recognition and attribute prediction.Experiments conducted on the Caltech-UCSD Birds 200dataset and Berkeley Human Attribute dataset demonstratesignificant improvements over state-of-art algorithms.


This work was partially supported by funding provided through National Science Foundation grants IIS : 1212928 ("Reconstructive recognition: Uniting statistical scene understanding and physics-based visual reasoning") and IIS-1116411 ("Hierarchical Probabilistic Layers for Visual Recognition of Complex Objects"). Additional support was provided by DARPA's Minds Eye and MSEE programs, the Toyota Corporation, and the NDSEG Fellowship Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the funders.

Bibliographic Notes

Proceedings of the International Conference on Computer Vision 2013 (ICCV 2013), Sydney, Australia

Abbreviated Authors

N. Zhang, R. Farrell, F. Iandola, and T. Darrell

ICSI Research Group


ICSI Publication Type

Article in conference proceedings