Please use this identifier to cite or link to this item:
https://hdl.handle.net/2440/131666
Citations | ||
Scopus | Web of Science® | Altmetric |
---|---|---|
?
|
?
|
Type: | Conference paper |
Title: | Modular graph attention network for complex visual relational reasoning |
Author: | Zheng, Y. Wen, Z. Tan, M. Zeng, R. Chen, Q. Wang, Y. Wu, Q. |
Citation: | Lecture Notes in Artificial Intelligence, 2021, vol.12627, pp.137-153 |
Publisher: | Springer |
Publisher Place: | Cham, Switzerland |
Issue Date: | 2021 |
Series/Report no.: | Lecture Notes in Computer Science; 12627 |
ISBN: | 9783030695439 |
ISSN: | 0302-9743 1611-3349 |
Conference Name: | Asian Conference on Computer Vision (ACCV) (30 Nov 2020 - 4 Dec 2020 : virtual online) |
Statement of Responsibility: | Yihan Zheng, Zhiquan Wen, Mingkui Tan, Runhao Zeng, Qi Chen, Yaowei Wang, Qi Wu |
Abstract: | Visual Relational Reasoning is crucial for many vision-and-language based tasks, such as Visual Question Answering and Vision Language Navigation. In this paper, we consider reasoning on complex referring expression comprehension (c-REF) task that seeks to localise the target objects in an image guided by complex queries. Such queries often contain complex logic and thus impose two key challenges for reasoning: (i) It can be very difficult to comprehend the query since it often refers to multiple objects and describes complex relationships among them. (ii) It is non-trivial to reason among multiple objects guided by the query and localise the target correctly. To address these challenges, we propose a novel Modular Graph Attention Network (MGA-Net). Specifically, to comprehend the long queries, we devise a language attention network to decompose them into four types: basic attributes, absolute location, visual relationship and relative locations, which mimics the human language understanding mechanism. Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic. Extensive experiments on CLEVR-Ref+, GQA and CLEVR-CoGenT datasets demonstrate the superior reasoning performance of our MGA-Net. |
Rights: | © Springer Nature Switzerland AG 2021 |
DOI: | 10.1007/978-3-030-69544-6_9 |
Published version: | https://link.springer.com/book/10.1007/978-3-030-69544-6 |
Appears in Collections: | Aurora harvest 8 Computer Science publications |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.