Scalable Image Recognition and Retrieval

Kristen Grauman, Clare Boothe Luce Assistant Professor of Computer Science, University of Texas at Austin

The flexibility and scalability demanded from computer vision algorithms expected to operate in real-world conditions is staggering. To approach human-level performance, not only must they identify objects under a wide range of illuminations, against different backgrounds, or in a variety of poses, but they must also allow instantaneous recognition of on the order of tens of thousands of object categories. Recent years have shown much progress in the development of rich image representations, but current paradigms for learning and recognizing visual categories typically ignore the issue of scalability, often relying on carefully supervised (manually labeled or annotated) data, or having such high computational costs that artificial limits must be placed on the size of image descriptions or the amount of data from which object models are learned. Clearly, to be successful, vision algorithms must not only capture an incredible level of robustness, but they must do so efficiently.


In this talk I will describe our recent work aimed at making recognition and image retrieval practical on a large scale. We address the issue both by providing algorithms with significant computational advantages, as well as by developing unsupervised and semi-supervised strategies for learning visual categories with minimal manual input. I will overview our linear-time kernel for matching sets of unordered image features, introduce a randomized hashing algorithm that allows sub-linear time approximate similarity search over those partial correspondences, and discuss their implications for local learning and unsupervised category discovery. Using several benchmark image data sets, we show that our approach yields accuracy that is competitive with the state-of-the-art, while often requiring orders of magnitude less computation time or manual supervision.

Abstract Author(s): Kristen Grauman