Fully Convolutional Instance-level Visual Recognition

Tian, Zhi

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/132533

Type:	Thesis
Title:	Fully Convolutional Instance-level Visual Recognition
Author:	Tian, Zhi
Issue Date:	2021
School/Discipline:	School of Computer Science
Abstract:	Instance-level recognition such as object detection and instance segmentation are the fundamental problems in computer vision, which underpins many downstream computer vision applications. In this thesis, we propose a series of new methods to solve the problems with the simple and effective fully convolutional networks. First, we propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion. Unlike previous detectors, FCOS completely avoids the complicated computation related to anchor boxes such as calculating the intersection over union (IoU) scores during training. More importantly, we also avoid all hyper-parameters related to anchor boxes. We demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. Second, we propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose. The proposed framework directly predicts instance-aware keypoints for all the instances from a raw input image, eliminating the heuristic grouping in bottom-up methods or box detection and RoI operations in top-down ones. Third, we propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation). Top-performing instance segmentation methods such as Mask R-CNN rely on ROI operations (typically ROIPool or ROIAlign) to obtain the final instance masks. In contrast, we propose to solve instance segmentation with dynamic instance-aware networks, conditioned on instances. For the first time, we demonstrate a simpler instance segmentation method that can achieve improved performance in both accuracy and inference speed. Finally, we present a high-performance method that can achieve mask-level instance segmentation with only box annotations for training. Our core idea is to redesign the loss of learning masks in CondInst, with no modification to the network itself. The new loss functions can supervise the mask training without relying on mask annotations. Our excellent experiment results on COCO and Pascal VOC indicate that our method dramatically narrows the performance gap between weakly and fully supervised instance segmentation. Codes are publicly available at https://github.com/aim-uofa/AdelaiDet.
Advisor:	Shen, Chunhua Wu, Qi
Dissertation Note:	Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2021
Keywords:	Convolutional neural networks fully convolutional networks object detection instance segmentation human pose estimation weakly supervised instance segmentation
Provenance:	This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:	Research Theses

Files in This Item:

File	Description	Size	Format
Tian2021_PhD.pdf		39.84 MB	Adobe PDF	View/Open

Show full item record

Adelaide Research & Scholarship