Abstract
3D scene understanding is key for the success of applications such as autonomous driving and robot navigation. However, existing approaches either produce a mild level of understanding, e.g., segmentation, object detection, or are not accurate enough for these applications, e.g., 3D pop-ups. In this paper we propose a principled generative model of 3D urban scenes that takes into account dependencies between static and dynamic features. We derive a reversible jump MCMC scheme that is able to infer the geometric (e.g., street orientation) and topological (e.g., number of intersecting streets) properties of the scene layout, as well as the semantic activities occurring in the scene, e.g., traffic situations at an intersection. Furthermore, we show that this global level of understanding provides the context necessary to disambiguate current state-of-the-art detectors. We demonstrate the effectiveness of our approach on a dataset composed of short stereo video sequences of 113 different scenes captured by a car driving around a mid-size city.
Keywords
Affiliated Institutions
Related Publications
SUN database: Large-scale scene recognition from abbey to zoo
Scene categorization is a fundamental problem in computer vision. However, scene understanding research has been constrained by the limited scope of currently-used databases whi...
Object Detection With Deep Learning: A Review
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection ...
What, where and who? Classifying events by scene and object recognition
We propose a first attempt to classify events in static images by integrating scene and object categorizations. We define an event in a static image as a human activity taking p...
A ConvNet for the 2020s
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification...
FCOS: Fully Convolutional One-Stage Object Detection
We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all stat...
Publication Info
- Year
- 2011
- Type
- article
- Volume
- 220
- Pages
- 1945-1952
- Citations
- 57
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/cvpr.2011.5995641