Probabilistic Color De-rendering

Last updated: April 2012

Most images found on the internet are produced by consumer digital cameras, which use tone-mapping to create compact, narrow-gamut images that are nonetheless visually pleasing. Unfortunately, in doing so, consumer cameras discard or distort substantial radiometric signal that could otherwise be used for many computer vision applications that rely on accurate measurements of scene colors: multi-view 3D reconstruction, photometric stereo, denoising, and many more. Existing methods attempt to undo these effects through deterministic maps that de-render the reported narrow-gamut colors (JPG) back to their original wide-gamut sensor measurements (RAW). Deterministic approaches are unreliable, however, because the cameras' wide-to-narrow mapping is often many-to-one, as shown in the figure below (yellow dots=wide-gamut colors, black dots=narrow gamut colors):

color rendering: many-to-one mapping

The reverse mapping is thus one-to-many and has inherent uncertainty and loss of information. Our solution is to use probabilistic maps, rather than deterministic ones, providing uncertainty estimates useful to many applications. In Xiong et al. we used a non-parametric Bayesian regression technique---local Gaussian process regression---to predict for each pixel's narrow-gamut color a probability distribution over the scene colors that could have created it, described by a Gaussian distribution with a mean and variance:

derendering of internet images

Using a variety of consumer cameras we show that these distributions, once learned from training data, are effective in simple probabilistic adaptations of two popular applications: multi-exposure imaging and photometric stereo. Our results on these applications are better than those of corresponding deterministic approaches, especially for saturated and out-of-gamut colors.

Papers:

Y. Xiong, K. Saenko, T. Zickler, T. Darrell, "From Pixels to Physics: Probabilistic Color De-rendering", to appear in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

Domain Adaptation for Object recognition

Last updated: September 2011

dataset shift Domain adaptation is an important emerging topic in computer vision. This project investigates domain shift in the context of object recognition. We introduced a method that adapts object models acquired in a particular visual domain to new imaging conditions by learning a transformation that minimizes the effect of domain-induced changes in the feature distribution. The transformation is learned in a supervised manner and can be applied to categories for which there are no labeled examples in the new domain. While we focus our evaluation on object recognition tasks, the transform-based adaptation technique we develop is general and could be applied to non-image data. We experimentally demonstrate the ability of our method to improve recognition on categories with few or no target domain labels and moderate to large changes in the imaging conditions.

Papers:

B. Kulis, K. Saenko, and T. Darrell, "What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms" In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

K. Saenko, B. Kulis, M. Fritz and T. Darrell, "Adapting Visual Category Models to New Domains" In Proc. ECCV, September 2010, Heraklion, Greece. [code] [dataset]

Database for Studying Effects of Domain Shift in Object Recognition

Last updated: September 2011

datasetEffects of domain shift have been largely overlooked in previous object recognition studies. We collected a database that allows researchers to study, evaluate and compare solutions to the domain shift problem by establishing a multiple-domain labeled dataset and benchmark. In addition to the domain shift aspects, this database also proposes a challenging office environment category learning task which reflects the difficulty of real-world indoor robotic object recognition, and may serve as a useful testbed for such tasks. It contains a total of 4652 images of 31 categories originating from the following three domains: images from the web, digital SLR and webcam. For further details please refer to our paper.

If you use the dataset in your research, please cite:

K. Saenko, B. Kulis, M. Fritz and T. Darrell, "Adapting Visual Category Models to New Domains" In Proc. ECCV, September 2010, Heraklion, Greece.

Download the database: images, features, features and object ids.

Visual Sense disambiguation

Last updated: September 2011

visual sensesPolysemy is a problem for methods that exploit image search engines to build object category models. Previously, unsupervised approaches did not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense.

We also argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that exploits the semantic structure of Wordnet to remove abstract senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects.

Papers:

K, Saenko and T. Darrell, "Filtering Abstract Senses From Image Search Results´ In Proc. NIPS, December 2009, Vancouver, Canada.

K, Saenko and T. Darrell, "Unsupervised Learning of Visual Sense Models for Polysemous Words". Proc. NIPS, December 2008, Vancouver, Canada.

Download: READMEdata.(coming soon)