Context Learning and Weakly Supervised Learning for Semantic Segmentation

Shen, Tong

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/120354

Type:	Thesis
Title:	Context Learning and Weakly Supervised Learning for Semantic Segmentation
Author:	Shen, Tong
Issue Date:	2018
School/Discipline:	School of Computer Science
Abstract:	This thesis focuses on one of the fundamental problems in computer vision, semantic segmentation, whose task is to predict a semantic label for each pixel of an image. Although semantic segmentation models have been largely improved thanks to the great representative power of deep learning techniques, there are still open questions needed to be discussed. In this thesis, we discuss two problems regarding semantic segmentation, scene consistency and weakly supervised segmentation. In the first part of the thesis, we discuss the issue of scene consistency in semantic segmentation. This issue comes from the fact that trained models sometimes produce noisy and implausible predictions that are not semantically consistent with the scene or context. By explicitly considering scene consistency both locally and globally, we can narrow down the possible categories for each pixel and generate the desired prediction more easily. In the thesis, we address this issue by introducing a dense multi-label module. In general, multi-label classification refers to the task of assigning multiple labels to a given image. We extend the idea to different levels of the image, and assign multiple labels to different regions of the image. Dense multi-label acts as a constraint to encourage scene consistency locally and globally. For dense prediction problems such as semantic segmentation, training a model requires densely annotated data as ground-truth, which involves a great amount of human annotation effort and is very time-consuming. Therefore, it is worth investigating semi- or weakly supervised methods that require much less supervision. Particularly, weakly supervised segmentation refers to training the model using only image-level labels, while semi-supervised segmentation refers to using partially annotated data or a small portion of fully annotated data to train. In the thesis, two weakly supervised methods are proposed where only image-level labels are required. The two methods share some similar motivations. First of all, since pixel-level masks are missing in this particular setting, the two methods are all designed to estimate the missing ground-truth and further use them as pseudo ground-truth for training. Secondly, they both use data retrieved from the internet as auxiliary data because web data are cheap to obtain and exist in a large amount. Although there are similarities between these two methods, they are designed from different perspectives. The motivation for the first method is that given a group of images crawled from the internet that belong to the same semantic category, it is a good choice to use co-segmentation to extract the masks of them, which gives us almost free pixel-wise training samples. Those internet images along with the extracted masks are used to train a mask generator to help us estimate the pseudo ground-truth for the training images. The second method is designed as a bi-directional framework between the target domain and the web domain. The term “bi-directional” refers to the concept that the knowledge learnt from the target domain can be transferred to the web domain and the knowledge encoded in the web domain can be transferred back to the target domain. This kind of interaction between two domains is the core to boost the performance of webly supervised segmentation.
Advisor:	Shen, Chunhua
Dissertation Note:	Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2018
Keywords:	weakly supervised learning semantic segmentation
Provenance:	This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:	Research Theses

Files in This Item:

File	Description	Size	Format
Shen2018_PhD.pdf		32.3 MB	Adobe PDF	View/Open

Show full item record

Adelaide Research & Scholarship