Abstract

Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations, 20 000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

Keywords

Computer scienceLeverage (statistics)Artificial intelligenceSuiteBenchmark (surveying)Context (archaeology)Scale (ratio)Set (abstract data type)PixelSemantics (computer science)Machine learningComputer visionCartographyGeography

Affiliated Institutions

Related Publications

Publication Info

Year
2016
Type
article
Pages
3213-3223
Citations
11212
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

11212
OpenAlex

Cite This

Marius Cordts, Mohamed Omran, Sebastian Ramos et al. (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding. , 3213-3223. https://doi.org/10.1109/cvpr.2016.350

Identifiers

DOI
10.1109/cvpr.2016.350