Why Size Matters: Feature Coding as Nystrom Sampling

TitleWhy Size Matters: Feature Coding as Nystrom Sampling
Publication TypeConference Paper
Year of Publication2013
AuthorsVinyals, O., Jia Y., & Darrell T.
Other Numbers3440

Recently, the computer vision and machine learning community has been in favor of feature extractionpipelines that rely on a coding step followed by a linear classifier, due to their overall simplicity,well understood properties of linear classifiers, and their computational efficiency. In this paper wepropose a novel view of this pipeline based on kernel methods and Nystr¨om sampling. In particular,we focus on the coding of a data point with a local representation based on a dictionary with fewerelements than the number of data points, and view it as an approximation to the actual functionthat would compute pair-wise similarity to all data points (often too many to compute in practice),followed by a Nystr¨om sampling step to select a subset of all data points.Furthermore, since bounds are known on the approximation power of Nystr¨om sampling as a functionof how many samples (i.e. dictionary size) we consider, we can derive bounds on the approximationof the exact (but expensive to compute) kernel matrix, and use it as a proxy to predictaccuracy as a function of the dictionary size, which has been observed to increase but also to saturateas we increase its size. This model may help explaining the positive effect of the codebook size[2, 7] and justifying the need to stack more layers (often referred to as deep learning), as flat modelsempirically saturate as we add more complexity.

Bibliographic Notes

Proceedings of International Conference on Learning Representations (ICLR 2013), Scottsdale, Arizona

Abbreviated Authors

O. Vinyals, Y. Jia, and T. Darrell

ICSI Research Group


ICSI Publication Type

Article in conference proceedings