A generative model for 3D urban scene understanding from movable platforms

Abstract

3D scene understanding is key for the success of applications such as autonomous driving and robot navigation. However, existing approaches either produce a mild level of understanding, e.g., segmentation, object detection, or are not accurate enough for these applications, e.g., 3D pop-ups. In this paper we propose a principled generative model of 3D urban scenes that takes into account dependencies between static and dynamic features. We derive a reversible jump MCMC scheme that is able to infer the geometric (e.g., street orientation) and topological (e.g., number of intersecting streets) properties of the scene layout, as well as the semantic activities occurring in the scene, e.g., traffic situations at an intersection. Furthermore, we show that this global level of understanding provides the context necessary to disambiguate current state-of-the-art detectors. We demonstrate the effectiveness of our approach on a dataset composed of short stereo video sequences of 113 different scenes captured by a car driving around a mid-size city.

Keywords

Computer scienceArtificial intelligenceComputer visionIntersection (aeronautics)Context (archaeology)SegmentationGenerative modelDroneKey (lock)Object (grammar)RobotPedestrianGenerative grammarObject detectionGeographyCartography

Affiliated Institutions

Related Publications

SUN database: Large-scale scene recognition from abbey to zoo

Jianxiong Xiao , James Hays , Krista A. Ehinger +2 more

Scene categorization is a fundamental problem in computer vision. However, scene understanding research has been constrained by the limited scope of currently-used databases whi...

2010 3052 citations

Object Detection With Deep Learning: A Review

Zhong‐Qiu Zhao , Peng Zheng , Shou-Tao Xu +1 more

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection ...

2019 IEEE Transactions on Neural Networks ... 5019 citations

What, where and who? Classifying events by scene and object recognition

Li-Jia Li , Li Fei-Fei

We propose a first attempt to classify events in static images by integrating scene and object categorizations. We define an event in a static image as a human activity taking p...

2007 792 citations

A ConvNet for the 2020s

Zhuang Liu , Hanzi Mao , Chao-Yuan Wu +3 more

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification...

2022 2022 IEEE/CVF Conference on Computer ... 5683 citations

FCOS: Fully Convolutional One-Stage Object Detection

Zhi Tian , Chunhua Shen , Hao Chen +1 more

We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all stat...

2019 2019 IEEE/CVF International Conferenc... 5672 citations

Publication Info

Year: 2011
Type: article
Volume: 220
Pages: 1945-1952
Citations: 57
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A generative model for 3D urban scene understanding from movable platforms

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Andreas Geiger, 
                                
                                    Martin Lauer, 
                                
                                    Raquel Urtasun
                                
                            (2011). 
                            A generative model for 3D urban scene understanding from movable platforms. 
                            
                            , 220
                            
                            , 1945-1952.
                            https://doi.org/10.1109/cvpr.2011.5995641

Identifiers

DOI: 10.1109/cvpr.2011.5995641