Deep End2End Voxel2Voxel Prediction | RDL Research Database

Abstract

Over the last few years deep learning methods have emerged as one of the most prominent approaches for video analysis. However, so far their most successful applications have been in the area of video classification and detection, i.e., problems involving the prediction of a single class label or a handful of output variables per video. Furthermore, while deep networks are commonly recognized as the best models to use in these domains, there is a widespread perception that in order to yield successful results they often require time-consuming architecture search, manual tweaking of parameters and computationally intensive preprocessing or post-processing methods. In this paper we challenge these views by presenting a deep 3D convolutional architecture trained end to end to perform voxel-level prediction, i.e., to output a variable at every voxel of the video. Most importantly, we show that the same exact architecture can be used to achieve competitive results on three widely different voxel-prediction tasks: video semantic segmentation, optical flow estimation, and video coloring. The three networks learned on these problems are trained from raw video without any form of preprocessing and their outputs do not require post-processing to achieve outstanding performance. Thus, they offer an efficient alternative to traditional and much more computationally expensive methods in these video domains.

Keywords

Computer scienceArtificial intelligencePreprocessorVoxelOptical flowDeep learningSegmentationTweakingConvolutional neural networkMachine learningPattern recognition (psychology)Computer visionImage (mathematics)

Affiliated Institutions

Related Publications

Fully Convolutional Networks for Semantic Segmentation

Evan Shelhamer , Jonathan Long , Trevor Darrell

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, im...

2016 IEEE Transactions on Pattern Analysis... 10715 citations

Learning Deconvolution Network for Semantic Segmentation

Hyeonwoo Noh , Seunghoon Hong , Bohyung Han

We propose a novel semantic segmentation algorithm by learning a deep deconvolution network. We learn the network on top of the convolutional layers adopted from VGG 16-layer ne...

2015 3978 citations

Fully convolutional networks for semantic segmentation

Jonathan Long , Evan Shelhamer , Trevor Darrell

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, ex...

2015 35498 citations

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture

David Eigen , Rob Fergus

In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a mu...

2015 2834 citations

Learning hierarchical representations for face verification with convolutional deep belief networks

Guoyang Huang , Honglak Lee , Erik Learned-Miller

Most modern face recognition systems rely on a feature representation given by a hand-crafted image descriptor, such as Local Binary Patterns (LBP), and achieve improved perform...

2012 412 citations

Publication Info

Year: 2016
Type: article
Citations: 106
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Deep End2End Voxel2Voxel Prediction

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

106

OpenAlex

Cite This

APA Style

                            
                                    Du Tran, 
                                
                                    Lubomir Bourdev, 
                                
                                    Rob Fergus
                                
                                et al.
                            
                            (2016). 
                            Deep End2End Voxel2Voxel Prediction. 
                            
                            .
                            https://doi.org/10.1109/cvprw.2016.57

Identifiers

DOI: 10.1109/cvprw.2016.57