Abstract

This paper presents a unified bag of visual word (BoW) framework for dynamic scene recognition. The approach builds on primitive features that uniformly capture spatial and temporal orientation structure of the imagery (e.g., video), as extracted via application of a bank of spatiotemporally oriented filters. Various feature encoding techniques are investigated to abstract the primitives to an intermediate representation that is best suited to dynamic scene representation. Further, a novel approach to adaptive pooling of the encoded features is presented that captures spatial layout of the scene even while being robust to situations where camera motion and scene dynamics are confounded. The resulting overall approach has been evaluated on two standard, publically available dynamic scene datasets. The results show that in comparison to a representative set of alternatives, the proposed approach outperforms the previous state-of-the-art in classification accuracy by 10%.

Keywords

Computer scienceArtificial intelligenceRepresentation (politics)PoolingSet (abstract data type)Computer visionPattern recognition (psychology)Feature (linguistics)Encoding (memory)

Affiliated Institutions

Related Publications

Recognizing indoor scenes

We propose a scheme for indoor place identification based on the recognition of global scene views. Scene views are encoded using a holistic representation that provides low-res...

2009 2009 IEEE Conference on Computer Visi... 1464 citations

Publication Info

Year
2014
Type
article
Pages
2681-2688
Citations
61
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

61
OpenAlex

Cite This

Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes (2014). Bags of Spacetime Energies for Dynamic Scene Recognition. , 2681-2688. https://doi.org/10.1109/cvpr.2014.343

Identifiers

DOI
10.1109/cvpr.2014.343