Abstract
Automatic recovery of 3D human pose from monocular image sequences is a \nchallenging and important research topic with numerous applications. Although \ncurrent methods are able to recover 3D pose for a single person in controlled \nenvironments, they are severely challenged by real-world scenarios, such as \ncrowded street scenes. To address this problem, we propose a three-stage \nprocess building on a number of recent advances. The first stage obtains an \ninitial estimate of the 2D articulation and viewpoint of the person from single \nframes. The second stage allows early data association across frames based on \ntracking-by-detection. These two stages successfully accumulate the available \n2D image evidence into robust estimates of 2D limb positions over short image \nsequences (= tracklets). The third and final stage uses those tracklet-based \nestimates as robust image observations to reliably recover 3D pose. We \ndemonstrate state-of-the-art performance on the HumanEva II benchmark, and also \nshow the applicability of our approach to articulated 3D tracking in realistic \nstreet conditions.
Keywords
Affiliated Institutions
Related Publications
Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes
Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presen...
End-to-End Recovery of Human Shape and Pose
We describe Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods tha...
Publication Info
- Year
- 2010
- Type
- article
- Citations
- 518
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/cvpr.2010.5540156