Abstract

In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. The proposed method obtains state-of-the-art performance on both the difficult Aachen Day-Night localization dataset and the InLoc indoor localization benchmark, as well as competitive performance on other benchmarks for image matching and 3D reconstruction.

Keywords

Artificial intelligenceComputer scienceBenchmark (surveying)Convolutional neural networkPattern recognition (psychology)Feature (linguistics)Matching (statistics)PixelJoint (building)Feature extractionDetectorImage (mathematics)Dual (grammatical number)Computer visionMathematicsEngineering

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
article
Pages
8084-8093
Citations
1068
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1068
OpenAlex

Cite This

Mihai Dusmanu, Ignacio Rocco, Tomáš Pajdla et al. (2019). D2-Net: A Trainable CNN for Joint Description and Detection of Local Features. , 8084-8093. https://doi.org/10.1109/cvpr.2019.00828

Identifiers

DOI
10.1109/cvpr.2019.00828