Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture

Abstract

In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutional network that is able to adapt easily to each task using only small modifications, regressing from the input image to the output map directly. Our method progressively refines predictions using a sequence of scales, and captures many image details without any superpixels or low-level segmentation. We achieve state-of-the-art performance on benchmarks for all three tasks.

Keywords

Computer scienceArtificial intelligenceSegmentationPattern recognition (psychology)Task (project management)Image (mathematics)Scale (ratio)Convolutional neural networkSequence (biology)ArchitectureImage segmentationSemantics (computer science)Computer vision

Affiliated Institutions

Related Publications

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Zongwei Zhou , Md Mahfuzur Rahman Siddiquee , Nima Tajbakhsh +1 more

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations...

2019 IEEE Transactions on Medical Imaging 3567 citations

Fully convolutional networks for semantic segmentation

Jonathan Long , Evan Shelhamer , Trevor Darrell

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, ex...

2015 35498 citations

Learning Hierarchical Features for Scene Labeling

Clément Farabet , Camille Couprie , Laurent Najman +1 more

Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network traine...

2012 IEEE Transactions on Pattern Analysis... 2684 citations

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen , Yukun Zhu , George Papandreou +2 more

Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale co...

2018 Lecture notes in computer science 13300 citations

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Zongwei Zhou , Md Mahfuzur Rahman Siddiquee , Nima Tajbakhsh +1 more

In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network w...

2018 Lecture notes in computer science 7871 citations

Publication Info

Year: 2015
Type: article
Citations: 2834
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

2834

OpenAlex

Cite This

APA Style

                            
                                    David Eigen, 
                                
                                    Rob Fergus
                                
                            (2015). 
                            Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture. 
                            
                            .
                            https://doi.org/10.1109/iccv.2015.304

Identifiers

DOI: 10.1109/iccv.2015.304