On the general value of evidence, and bilingual scene-text visual question answering

Wang, X.; Liu, Y.; Shen, C.; Ng, C.C.; Luo, C.; Jin, L.; Chan, C.S.; Van Den Hengel, A.; Wang, L.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/129208

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wang, X.	-
dc.contributor.author	Liu, Y.	-
dc.contributor.author	Shen, C.	-
dc.contributor.author	Ng, C.C.	-
dc.contributor.author	Luo, C.	-
dc.contributor.author	Jin, L.	-
dc.contributor.author	Chan, C.S.	-
dc.contributor.author	Van Den Hengel, A.	-
dc.contributor.author	Wang, L.	-
dc.date.issued	2020	-
dc.identifier.citation	Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp.10123-10132	-
dc.identifier.isbn	9781728171692	-
dc.identifier.issn	1063-6919	-
dc.identifier.issn	2575-7075	-
dc.identifier.uri	http://hdl.handle.net/2440/129208	-
dc.description.abstract	Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed in language. We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages, and an evaluation process that co-opts a well understood image-based metric to reflect the method’s ability to reason. Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct. The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge. Experiments and analyses are provided that show the value of the dataset. The dataset is available at www.est-vqa.org	-
dc.description.statementofresponsibility	Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton van den Hengel, Liangwei Wang	-
dc.language.iso	en	-
dc.publisher	IEEE	-
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition	-
dc.rights	©2020 IEEE	-
dc.source.uri	https://ieeexplore.ieee.org/xpl/conhome/9142308/proceeding	-
dc.title	On the general value of evidence, and bilingual scene-text visual question answering	-
dc.type	Conference paper	-
dc.contributor.conference	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (14 Jun 2020 - 19 Jun 2020 : Virtual online)	-
dc.identifier.doi	10.1109/CVPR42600.2020.01014	-
dc.publisher.place	online	-
pubs.publication-status	Published	-
dc.identifier.orcid	Wang, X. [0000-0001-9082-094X]	-
dc.identifier.orcid	Van Den Hengel, A. [0000-0003-3027-8364]	-
Appears in Collections:	Aurora harvest 4 Australian Institute for Machine Learning publications

Files in This Item:

There are no files associated with this item.

Show simple item record

Adelaide Research & Scholarship