Abstract
We propose GeoNet, a jointly unsupervised learning framework for monocular depth, optical flow and egomotion estimation from videos. The three components are coupled by the nature of 3D scene geometry, jointly learned by our framework in an end-to-end manner. Specifically, geometric relationships are extracted over the predictions of individual modules and then combined as an image reconstruction loss, reasoning about static and dynamic scene parts separately. Furthermore, we propose an adaptive geometric consistency loss to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively. Experimentation on the KITTI driving dataset reveals that our scheme achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones.
Keywords
Affiliated Institutions
Related Publications
Deep convolutional neural fields for depth estimation from a single image
We consider the problem of depth estimation from a sin- gle monocular image in this work. It is a challenging task as no reliable depth cues are available, e.g., stereo corre- s...
End-to-End Instance Segmentation with Recurrent Attention
While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to di...
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation
This paper considers the task of articulated human pose estimation of multiple people in real world images. We propose an approach that jointly solves the tasks of detection and...
Deep Ordinal Regression Network for Monocular Depth Estimation
Monocular depth estimation, which plays a crucial role in understanding 3D scene geometry, is an ill-posed problem. Recent methods have gained significant improvement by explori...
Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes
Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presen...
Publication Info
- Year
- 2018
- Type
- article
- Citations
- 1234
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/cvpr.2018.00212