Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/128999
Citations
Scopus Web of Science® Altmetric
?
?
Type: Conference paper
Title: V-PROM: A benchmark for visual reasoning using visual progressive matrices
Author: Teney, D.
Wang, P.
Cao, J.
Liu, L.
Shen, C.
Van Den Hengel, A.
Citation: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, 2020, vol.34, iss.2, pp.12071-12078
Publisher: Association for the Advancement of Artificial Intelligence
Publisher Place: Palo Alto, CA
Issue Date: 2020
Series/Report no.: AAAI Conference on Artificial Intelligence
ISBN: 9781577358237
ISSN: 2159-5399
2374-3468
Conference Name: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI) (7 Feb 2020 - 12 Feb 2020 : New York, USA)
Statement of
Responsibility: 
Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton van den Hengel
Abstract: Advances in machine learning have generated increasing enthusiasm for tasks that require high-level reasoning on top of perceptual capabilities, particularly over visual data. Such tasks include, for example, image captioning, visual question answering, and visual navigation. Their evaluation is however hindered by task-specific confounding factors and dataset biases. In parallel, the existing benchmarks for abstract reasoning are limited to synthetic stimuli (e.g. images of simple shapes) and do not capture the challenges of real-world data. We propose a new large-scale benchmark to evaluates abstract reasoning over real visual data. The test involves visual questions that require operations fundamental to many high-level vision tasks, such as comparisons of counts and logical operations on complex visual properties. The benchmark measures a method’s ability to infer high-level relationships and to generalise them over image-based concepts. We provide multiple training/test splits that require controlled levels of generalization. We evaluate a range of deep learning architectures, and find that existing models, including those popular for vision-and-language tasks, are unable to solve seemingly-simple instances. Models using relational networks fare better but leave substantial room for improvement.
Description: AAAI-20 Technical Tracks 7 / AAAI Technical Track: Vision
Rights: Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
DOI: 10.1609/aaai.v34i07.6885
Published version: https://www.aaai.org/Library/AAAI/aaai20contents.php
Appears in Collections:Aurora harvest 8
Australian Institute for Machine Learning publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.