Abstract

Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. The proposed architecture makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets. Specifically, for a 2048 \(\times \) 1024 input, we achieve 68.4% Mean IOU on the Cityscapes test dataset with speed of 105 FPS on one NVIDIA Titan XP card, which is significantly faster than the existing methods with comparable performance.

Keywords

Computer scienceSegmentationUpsamplingArtificial intelligenceInferenceComputer visionImage segmentationImage resolutionContext (archaeology)Pattern recognition (psychology)Image (mathematics)

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
book-chapter
Pages
334-349
Citations
2572
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2572
OpenAlex
354
Influential

Cite This

Changqian Yu, Jingbo Wang, Chao Peng et al. (2018). BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. Lecture notes in computer science , 334-349. https://doi.org/10.1007/978-3-030-01261-8_20

Identifiers

DOI
10.1007/978-3-030-01261-8_20
PMID
41376882
PMCID
PMC12686288
arXiv
1808.00897

Data Quality

Data completeness: 84%