Cross-modal adaptation for RGB-D detection

TitleCross-modal adaptation for RGB-D detection
Publication TypeConference Proceedings
Year of Publication2016
AuthorsHoffman, J., Gupta S., Leong J., Guadarrama S., & Darrell T.
Published inIEEE International Conference on Robotics and Automation (ICRA)
Date Published05/2016
ISBN Number978-1-4673-8026-3
Accession Number16055574
KeywordsAdaptation models, Detectors, Object Detection, Proposals, Robots, Training, Training data

In this paper we propose a technique to adapt convolutional neural network (CNN) based object detectors trained on RGB images to effectively leverage depth images at test time to boost detection performance. Given labeled depth images for a handful of categories we adapt an RGB object detector for a new category such that it can now use depth images in addition to RGB images at test time to produce more accurate detections. Our approach is built upon the observation that lower layers of a CNN are largely task and category agnostic and domain specific while higher layers are largely task and category specific while being domain agnostic. We operationalize this observation by proposing a mid-level fusion of RGB and depth CNNs. Experimental evaluation on the challenging NYUD2 dataset shows that our proposed adaptation technique results in an average 21% relative improvement in detection performance over an RGB-only baseline even when no depth training data is available for the particular category evaluated. We believe our proposed technique will extend advances made in computer vision to RGB-D data leading to improvements in performance at little additional annotation effort.

ICSI Research Group