Probabilistic Color De-rendering
Most images found on the internet are produced by consumer digital cameras, which use tone-mapping to create compact, narrow-gamut images that are nonetheless visually pleasing. Unfortunately, in doing so, consumer cameras discard or distort substantial radiometric signal that could otherwise be used for many computer vision applications that rely on accurate measurements of scene colors: multi-view 3D reconstruction, photometric stereo, denoising, and many more. Existing methods attempt to undo these effects through deterministic maps that de-render the reported narrow-gamut colors (JPG) back to their original wide-gamut sensor measurements (RAW). Deterministic approaches are unreliable, however, because the cameras' wide-to-narrow mapping is often many-to-one, as shown in the figure below (yellow dots=wide-gamut colors, black dots=narrow gamut colors):
The reverse mapping is thus one-to-many and has inherent uncertainty and loss of information. Our solution is to use probabilistic maps, rather than deterministic ones, providing uncertainty estimates useful to many applications. In Xiong et al. we used a non-parametric Bayesian regression technique---local Gaussian process regression---to predict for each pixel's narrow-gamut color a probability distribution over the scene colors that could have created it, described by a Gaussian distribution with a mean and variance:
Using a variety of consumer cameras we show that these distributions, once learned from training data, are effective in simple probabilistic adaptations of two popular applications: multi-exposure imaging and photometric stereo. Our results on these applications are better than those of corresponding deterministic approaches, especially for saturated and out-of-gamut colors.
Papers:
Y. Xiong, K. Saenko, T. Zickler, T. Darrell, "From Pixels to Physics: Probabilistic Color De-rendering", to appear in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
Domain adaptation is an
important
emerging topic in computer vision. This project investigates domain
shift in the context of object recognition.
We introduced a method that adapts object models acquired in a
particular visual domain to new imaging conditions by learning a
transformation that minimizes the effect of domain-induced changes in
the feature distribution. The transformation is learned in a supervised
manner and can be applied to categories for which there are no labeled
examples in the new domain. While we focus our evaluation on object
recognition tasks, the transform-based adaptation technique we develop
is general and could be applied to non-image data. We
experimentally demonstrate the ability of our method to improve
recognition on categories with few or no target domain labels and
moderate to large changes in the imaging conditions.
Effects
of domain shift have been largely overlooked in previous object
recognition studies. We collected a database that allows researchers to
study, evaluate and compare solutions to the domain shift problem by
establishing a multiple-domain labeled dataset and benchmark. In
addition to the domain shift aspects, this database also proposes a
challenging office environment category learning task which reflects
the difficulty of real-world indoor robotic object recognition, and may
serve as a useful testbed for such tasks. It contains a total of 4652
images of 31 categories originating from the following three domains:
images from the web, digital SLR and webcam. For further details please
refer to our paper.
Polysemy
is a problem for methods that exploit image search engines to build
object category models. Previously, unsupervised approaches did not
take word sense into consideration. We propose a new method that uses a
dictionary to learn models of visual word sense from a large collection
of unlabeled web data. The use of LDA to discover a latent sense space
makes the model robust despite the very limited nature of dictionary
definitions. The definitions are used to learn a distribution in the
latent space that best represents a sense. The algorithm then uses the
text surrounding image links to retrieve images with high probability
of a particular dictionary sense.