Abstract

Activity recognition in video is dominated by low- and mid-level features, and while demonstrably capable, by nature, these features carry little semantic meaning. Inspired by the recent object bank approach to image representation, we present Action Bank, a new high-level representation of video. Action bank is comprised of many individual action detectors sampled broadly in semantic space as well as viewpoint space. Our representation is constructed to be semantically rich and even when paired with simple linear SVM classifiers is capable of highly discriminative performance. We have tested action bank on four major activity recognition benchmarks. In all cases, our performance is better than the state of the art, namely 98.2% on KTH (better by 3.3%), 95.0% on UCF Sports (better by 3.7%), 57.9% on UCF50 (baseline is 47.9%), and 26.9% on HMDB51 (baseline is 23.2%). Furthermore, when we analyze the classifiers, we find strong transfer of semantics from the constituent action detectors to the bank classifier.

Keywords

Discriminative modelComputer scienceArtificial intelligenceClassifier (UML)Representation (politics)Semantics (computer science)Action recognitionSupport vector machinePattern recognition (psychology)Object (grammar)Action (physics)Natural language processing

Affiliated Institutions

Related Publications

Publication Info

Year
2012
Type
article
Citations
711
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

711
OpenAlex

Cite This

Siddharth Sadanand, Jason J. Corso (2012). Action bank: A high-level representation of activity in video. . https://doi.org/10.1109/cvpr.2012.6247806

Identifiers

DOI
10.1109/cvpr.2012.6247806