Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/131666
Citations
Scopus Web of Science® Altmetric
?
?
Type: Conference paper
Title: Modular graph attention network for complex visual relational reasoning
Author: Zheng, Y.
Wen, Z.
Tan, M.
Zeng, R.
Chen, Q.
Wang, Y.
Wu, Q.
Citation: Lecture Notes in Artificial Intelligence, 2021, vol.12627, pp.137-153
Publisher: Springer
Publisher Place: Cham, Switzerland
Issue Date: 2021
Series/Report no.: Lecture Notes in Computer Science; 12627
ISBN: 9783030695439
ISSN: 0302-9743
1611-3349
Conference Name: Asian Conference on Computer Vision (ACCV) (30 Nov 2020 - 4 Dec 2020 : virtual online)
Statement of
Responsibility: 
Yihan Zheng, Zhiquan Wen, Mingkui Tan, Runhao Zeng, Qi Chen, Yaowei Wang, Qi Wu
Abstract: Visual Relational Reasoning is crucial for many vision-and-language based tasks, such as Visual Question Answering and Vision Language Navigation. In this paper, we consider reasoning on complex referring expression comprehension (c-REF) task that seeks to localise the target objects in an image guided by complex queries. Such queries often contain complex logic and thus impose two key challenges for reasoning: (i) It can be very difficult to comprehend the query since it often refers to multiple objects and describes complex relationships among them. (ii) It is non-trivial to reason among multiple objects guided by the query and localise the target correctly. To address these challenges, we propose a novel Modular Graph Attention Network (MGA-Net). Specifically, to comprehend the long queries, we devise a language attention network to decompose them into four types: basic attributes, absolute location, visual relationship and relative locations, which mimics the human language understanding mechanism. Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic. Extensive experiments on CLEVR-Ref+, GQA and CLEVR-CoGenT datasets demonstrate the superior reasoning performance of our MGA-Net.
Rights: © Springer Nature Switzerland AG 2021
DOI: 10.1007/978-3-030-69544-6_9
Published version: https://link.springer.com/book/10.1007/978-3-030-69544-6
Appears in Collections:Aurora harvest 8
Computer Science publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.