Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/139871
Citations
Scopus Web of Science® Altmetric
?
?
Type: Journal article
Title: Generalized framework for image and video object segmentation using affinity learning and message passing GNNS
Author: Muthu, S.
Tennakoon, R.
Rathnayake, T.
Hoseinnezhad, R.
Suter, D.
Bab-Hadiashar, A.
Citation: Computer Vision and Image Understanding, 2023; 236:103812-1-103812-12
Publisher: Elsevier BV
Issue Date: 2023
ISSN: 1077-3142
1090-235X
Statement of
Responsibility: 
Sundaram Muthu, Ruwan Tennakoon, Tharindu Rathnayake, Reza Hoseinnezhad, David Suter, Alireza Bab-Hadiashar
Abstract: Despite significant amount of work reported in the computer vision literature, segmenting images or videos based on multiple cues such as objectness, texture and motion, is still a challenge. This is particularly true when the number of objects to be segmented is not known or there are objects that are not classified in the training data (unknown objects). A possible remedy to this problem is to utilize graph-based clustering techniques such as Correlation Clustering. It is known that using long range affinities (Lifted multicut), makes correlation clustering more accurate than using only adjacent affinities (Multicut). However, the former is computationally expensive and hard to use. In this paper, we introduce a new framework to perform image/motion segmentation using an affinity learning module and a Message Passing Graph Neural Network (MPGNN). The affinity learning module uses a permutation invariant affinity representation to overcome the multi-object problem. The paper shows, both theoretically and empirically, that the proposed MPGNN aggregates higher order information and thereby converts the Lifted Multicut Problem (LMP) to a Multicut Problem (MP), which is easier and faster to solve. Importantly, the proposed method can be generalized to deal with different clustering problems with the same MPGNN architecture. For instance, our method produces competitive results for single image segmentation (on BSDS dataset) as well as unsupervised video object segmentation (on DAVIS17 dataset), by only changing the feature extraction part. In addition, using an ablation study on the proposed MPGNN architecture, we show that the way we update the parameterized affinities directly contributes to the accuracy of the results.
Keywords: Unsupervised video object segmentation; Image segmentation; Lifted multi-cuts; Graph neural networks; Affinity learning
Rights: © 2023 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
DOI: 10.1016/j.cviu.2023.103812
Grant ID: http://purl.org/au-research/grants/arc/LP160100662
Published version: http://dx.doi.org/10.1016/j.cviu.2023.103812
Appears in Collections:Computer Science publications

Files in This Item:
File Description SizeFormat 
hdl_139871.pdfPublished version1.86 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.