Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/132889
Type: Thesis
Title: Object-centric Mapping
Author: Li, Kejie
Issue Date: 2021
School/Discipline: School of Computer Science
Abstract: This thesis focuses on building an object-centric 3D map given an RGB image sequence, in which the basic elements are object instances. This is fundamentally different from conventional visual Simultaneous Localisation and Mapping (SLAM) that describes the geometry of an environment using geometric entities, such as 3D points, voxels or surfels. Representing an environment at the level of objects captures both the semantic and geometric information of an environment. It is more natural, compact, and closer to how human beings perceive the environment. Specifically, we investigate methods where well-studied geometry and deep learning can be combined to achieve object-centric mapping in general scenes. We first build upon recent advances in deep learning for single-view object reconstruction, a task of recovering full 3D object shape from a single RGB image. An open question when using deep networks to solve this question is how to generate object shape efficiently. To this end, we propose a novel multi-view representation to generate dense point cloud efficiently. Although this pure deep learning paradigm shows impressive results on synthetic data, the lack of a large amount of annotated real images leads to a domain gap when inference on real images. We then introduce a new single-view object reconstruction method by combining well-studied geometry and a deep learned shape prior. This new approach optimises an object’s shape and pose using both 2D image cues, such as object silhouette, and constraints on learned object shape prior at inference time. Although we only address the single-view object reconstruction in this work, the online refinement makes it straightforward to incorporate more observations. We introduce our first object-centric mapping system – FroDO (From Detections to Objects) based on our works on single-view object reconstruction. It takes as input an RGB image sequence and infers object location, pose, and shape in a coarse-to-fine manner, meaning that we reconstruct an object-centric map starting from a set of 2D object detections, through a 3D bounding box, to a sparse point cloud, and a dense mesh progressively. Although FroDO shows promising results on general and cluttered indoor scenes, it is neither an online system nor capable of handling object motions. To address the limitations of FroDO, we subsequently present MO-LTR (Multiple Object Localisation, Tracking, and Reconstruction). It combines a monocular object detector for object pose and scale prediction, a shape embedding network for shape modelling, and an IMM filter for tracking. Although each component’s contribution is relatively incremental, as a system, it achieves dynamic object-centric mapping for both indoor and outdoor scenes.
Advisor: Reid, Ian
Chin, Tat-Jun
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2021
Keywords: Semantic SLAM
object localisation
object reconstruction
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Li2021_PhD.pdfThesis20.43 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.