Visual question answering as reading comprehension

Li, H.; Wang, P.; Shen, C.; Van Den Hengel, A.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/127237

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Conference paper
Title:	Visual question answering as reading comprehension
Author:	Li, H. Wang, P. Shen, C. Van Den Hengel, A.
Citation:	Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019, vol.2019-June, pp.6312-6321
Publisher:	IEEE
Publisher Place:	online
Issue Date:	2019
Series/Report no.:	IEEE Conference on Computer Vision and Pattern Recognition
ISBN:	9781728132938
ISSN:	1063-6919
Conference Name:	IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (15 Jun 2019 - 20 Jun 2019 : Long Beach, USA)
Statement of Responsibility:	Hui Li, Peng Wang, Chunhua Shen, Anton van den Hengel
Abstract:	Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the form of text. Current methods jointly embed both the visual information and the textual feature into the same space. Nevertheless, how to model the complex interactions between the two different modalities is not an easy work. In contrast to struggling on multimodal feature fusion, in this paper, we propose to unify all the input information by natural language so as to convert VQA into a machine reading comprehension problem. With this transformation, our method not only can tackle VQA datasets that focus on observation based questions, but can also be naturally extended to handle knowledge-based VQA which requires to explore large-scale external knowledge base. It is a step towards being able to exploit large volumes of text and natural language processing techniques to address VQA problem. Two types of models are proposed to deal with open-ended VQA and multiple-choice VQA respectively. We evaluate our models on three VQA benchmarks. The comparable performance with the state-of-the-art demonstrates the effectiveness of the proposed method.
Rights:	©2019 IEEE
DOI:	10.1109/CVPR.2019.00648
Published version:	http://dx.doi.org/10.1109/cvpr.2019.00648
Appears in Collections:	Aurora harvest 3 Australian Institute for Machine Learning publications

Files in This Item:

File	Description	Size	Format
hdl_127237.pdf	Submitted version	5.37 MB	Adobe PDF	View/Open

Show full item record

Adelaide Research & Scholarship