New Research: Improving Computer Vision with Poselets

Wednesday, May 16, 2012

posepooling image for ICSI blog

Computer vision techniques have trouble recognizing subcategories of objects (for example, a vehicle’s model type or a bird’s species). A new method developed by ICSI researchers improves automatic recognition of subcategories by first warping small areas of  photos to account for differences in pose and angle, and then grouping the areas according to their similarities.

The differences between subcategories are often subtle and localized to small portions of an object. At the upcoming IEEE Conference on Computer Vision and Pattern Recognition, ICSI researchers will present their new method based on poselets.

Poselets capture the shape and appearance of a portion of an object, coupling both the object’s pose and how it’s facing the camera.  Each category of object has many poselets, some of which overlap.  For example, the category “birds” may have one poselet showing the left side of the bird’s face, while another includes both the left side at a different angle as well as the front of the head.  For this reason, no single photo has all possible poselets.  The fact that photos generally have different sets of poselets makes comparison between them difficult.

The researchers’ new method uses a warped feature kernel to distort one poselet into another where possible. This allows a more direct comparison of two photos that may have different poselets. The new approach also gathers information across poselets – including the distorted poselets – within each image so that they can more easily be compared.

The researchers used these comparisons with nearest-neighbor and kernel-based learning on a set of photos of birds. The method was more effective at identifying subcategories – bird species – than those methods that only consider statistics from the whole image without normalizing poses.

Related Paper: “Pose Pooling Kernels for Sub-Category Recognition.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2012. Available online at http://www.icsi.berkeley.edu/cgi-bin/pubs/publication.pl?ID=3300 .