Discovering objects and their location in images

Abstract

We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic Latent Semantic Analysis (pLSA). In text analysis this is used to discover topics in a corpus using the bag-of-words document representation. Here we treat object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics. The model is applied to images by using a visual analogue of a word, formed by vector quantizing SIFT-like region descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approximate spatial layout are found without supervision. Performance of this unsupervised method is compared to the supervised approach of Fergus et al. [8] on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include ‘doublets’ which encode spatially local co-occurring regions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classification and segmentation methods are applied to a set of images containing multiple objects per image. These results demonstrate that we can successfully build object class models from an unsupervised analysis of images.

Keywords

Artificial intelligenceComputer scienceProbabilistic latent semantic analysisPattern recognition (psychology)VocabularyObject (grammar)Set (abstract data type)Bag-of-words model in computer visionSegmentationRepresentation (politics)Bag-of-words modelImage segmentationClass (philosophy)Image (mathematics)Scale-invariant feature transformProbabilistic logicWord (group theory)Visual WordNatural language processingImage retrievalMathematics

Affiliated Institutions

Related Publications

Modeling scenes with local descriptors and latent aspects

Pedro Quelhas , Florent Monay , Jean‐Marc Odobez +3 more

We present a new approach to model visual scenes in image collections, based on local invariant features and probabilistic latent space models. Our formulation provides answers ...

2005 345 citations

FAB-MAP 3D: Topological mapping with spatial and visual appearance

Rohan Paul , Paul Newman

This paper describes a probabilistic framework for appearance based navigation and mapping using spatial and visual appearance data. Like much recent work on appearance based na...

2010 122 citations

LVIS: A Dataset for Large Vocabulary Instance Segmentation

Agrim Gupta , Piotr Dollár , Ross Girshick

Progress on object detection is enabled by datasets that focus the research community’s attention on open challenges. This process led us from simple images to complex scenes an...

2019 1091 citations

Composite Statistical Inference for Semantic Segmentation

Fuxin Li , João Carreira , Guy Lebanon +1 more

In this paper we present an inference procedure for the semantic segmentation of images. Different from many CRF approaches that rely on dependencies modeled with unary and pair...

2013 27 citations

Semantic Understanding of Scenes Through the ADE20K Dataset

Bolei Zhou , Hang Zhao , Xavier Puig +4 more

Semantic understanding of visual scenes is one of the holy grails of computer vision. Despite efforts of the community in data collection, there are still few image datasets cov...

2018 International Journal of Computer Vision 1504 citations

Publication Info

Year: 2005
Type: article
Pages: 370-377 Vol. 1
Citations: 980
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Discovering objects and their location in images

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

980

OpenAlex

Cite This

APA Style

                            
                                    Josef Šivic, 
                                
                                    Bryan Russell, 
                                
                                    Alexei A. Efros
                                
                                et al.
                            
                            (2005). 
                            Discovering objects and their location in images. 
                            
                            , 370-377 Vol. 1.
                            https://doi.org/10.1109/iccv.2005.77

Identifiers

DOI: 10.1109/iccv.2005.77