Learning deeply supervised good features to match for dense monocular reconstruction

Weerasekera, C.S.; Garg, R.; Latif, Y.; Reid, I.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/124477

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Conference paper
Title:	Learning deeply supervised good features to match for dense monocular reconstruction
Author:	Weerasekera, C.S. Garg, R. Latif, Y. Reid, I.
Citation:	Lecture Notes in Artificial Intelligence, 2019 / Jawahar, C.V., Li, H., Mori, G., Schindler, K. (ed./s), vol.11365, pp.609-624
Publisher:	Springer
Publisher Place:	Switzerland
Issue Date:	2019
Series/Report no.:	Lecture Notes in Computer Science ; 11365
ISBN:	9783030208721
ISSN:	0302-9743 1611-3349
Conference Name:	Asian Conference on Computer Vision (ACCV) (2 Dec 2018 - 6 Dec 2018 : Perth, Australia)
Editor:	Jawahar, C.V. Li, H. Mori, G. Schindler, K.
Statement of Responsibility:	Chamara Saroj Weerasekera, B, Ravi Garg, Yasir Latif, and Ian Reid
Abstract:	Visual SLAM (Simultaneous Localization and Mapping) methods typically rely on handcrafted visual features or raw RGB values for establishing correspondences between images. These features, while suitable for sparse mapping, often lead to ambiguous matches in texture-less regions when performing dense reconstruction due to the aperture problem. In this work, we explore the use of learned features for the matching task in dense monocular reconstruction. We propose a novel convolutional neural network (CNN) architecture along with a deeply supervised feature learning scheme for pixel-wise regression of visual descriptors from an image which are best suited for dense monocular SLAM. In particular, our learning scheme minimizes a multi-view matching cost-volume loss with respect to the regressed features at multiple stages within the network, for explicitly learning contextual features that are suitable for dense matching between images captured by a moving monocular camera along the epipolar line. We integrate the learned features from our model for depth estimation inside a real-time dense monocular SLAM framework, where photometric error is replaced by our learned descriptor error. Our extensive evaluation on several challenging indoor datasets demonstrate greatly improved accuracy in dense reconstructions of the well celebrated dense SLAM systems like DTAM, without compromising their real-time performance.
Keywords:	Mapping; Visual learning; 3D reconstruction; SLAM
Rights:	© Springer Nature Switzerland AG 2019
DOI:	10.1007/978-3-030-20873-8_39
Grant ID:	http://purl.org/au-research/grants/arc/FL130100102 http://purl.org/au-research/grants/arc/CE140100016
Published version:	http://dx.doi.org/10.1007/978-3-030-20873-8_39
Appears in Collections:	Aurora harvest 8 Computer Science publications

Files in This Item:

There are no files associated with this item.

Show full item record

Adelaide Research & Scholarship