Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/129333
Type: Thesis
Title: Semantic Image Segmentation and Other Dense Per-Pixel Tasks: Practical Approaches
Author: Nekrasov, Vladimir
Issue Date: 2020
School/Discipline: School of Computer Science
Abstract: Computer vision-based and deep learning-driven applications and devices are now a part of our everyday life: from modern smartphones with an ever increasing number of cameras and other sensors to autonomous vehicles such as driverless cars and self-piloting drones. Even though a large portion of the algorithms behind those systems has been known for ages, the computational power together with the abundance of labelled data were lacking until recently. Now, following the Occam’s razor principle, we should start re-thinking those algorithms and strive towards their further simplification, both to improve our own understanding and expand the realm of their practical applications. With those goals in mind, in this work we will concentrate on a particular type of computer vision tasks that predict a certain quantity of interest for each pixel in the input image – these are so-called dense per-pixel tasks. This choice is not by chance: while there has been a huge amount of works concentrated on per-image tasks such as image classification with levels of performance reaching nearly 100%, dense per-pixel tasks bring a different set of challenges that traditionally require more computational resources and more complicated approaches. Throughout this thesis, our focus will be on reducing these computational requirements and instead presenting simple approaches to build practical vision systems that can be used in a variety of settings – e.g. indoors or outdoors, on low-resolution or high-resolution images, solving a single task or multiple tasks at once, running on modern GPU cards or on embedded devices such as Jetson TX. In the first part of the manuscript we will adapt an existing powerful but slow semantic segmentation network into a faster and competitive one through a manual re-design and analysis of its building blocks. With this approach, we will achieve nearly 3× decrease in the number of parameters and in the runtime of the network with an equally high accuracy. In the second part we then will alter this compact network in order to solve multiple dense per-pixel tasks at once, still in real-time. We will also demonstrate the value of predicting multiple quantities at once, as an example creating a 3D semantic reconstruction of the scene. In the third part, we will move away from the manual design and instead will rely on reinforcement learning to automatically traverse the search space of compact semantic segmentation architectures. While the majority of architecture search methods are computationally extremely expensive even for image classification, we will present a solution that requires only 2 generic GPU cards. Finally, in the last part we will extend our automatic architecture search solution to discover tiny but still competitive networks with less than 300K parameters taking only 1.5MB of a disk space.
Advisor: Reid, Ian
Shen, Chunhua
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2020
Keywords: Semantic segmentation
deep learning
real-time inference
neural architecture search
multi-task learning
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Nekrasov2020_PhD.pdfThesis26.49 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.