Abstract
In this paper we introduce a 3-dimensional (3D) SIFT descriptor for video or 3D imagery such as MRI data. We also show how this new descriptor is able to better represent the 3D nature of video data in the application of action recognition. This paper will show how 3D SIFT is able to outperform previously used description methods in an elegant and efficient manner. We use a bag of words approach to represent videos, and present a method to discover relationships between spatio-temporal words in order to better describe the video data.
Keywords
Affiliated Institutions
Related Publications
Action recognition by dense trajectories
Feature trajectories have shown to be efficient for rep-resenting videos. Typically, they are extracted using the KLT tracker or matching SIFT descriptors between frames. Howeve...
Fisher Vector Faces in the Wild
Several recent papers on automatic face verification have significantly raised the performance bar by developing novel, specialised representations that outperform standard feat...
Discovering objects and their location in images
We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic Laten...
A performance evaluation of local descriptors
In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector. Many different descriptor...
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis
Previous work on action recognition has focused on adapting hand-designed local features, such as SIFT or HOG, from static images to the video domain. In this paper, we propose ...
Publication Info
- Year
- 2007
- Type
- article
- Pages
- 357-360
- Citations
- 1611
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1145/1291233.1291311