Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/132209
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLiao, Z.-
dc.contributor.authorWu, Q.-
dc.contributor.authorShen, C.-
dc.contributor.authorVan Den Hengel, A.-
dc.contributor.authorVerjans, J.-
dc.contributor.editorCappellato, L.-
dc.contributor.editorEickhoff, C.-
dc.contributor.editorFerro, N.-
dc.contributor.editorNévéol, A.-
dc.date.issued2020-
dc.identifier.citationCEUR Workshop Proceedings, 2020 / Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (ed./s), vol.2696, pp.1-14-
dc.identifier.issn1613-0073-
dc.identifier.urihttps://hdl.handle.net/2440/132209-
dc.descriptionSession - ImageCLEF: Multimedia Retrieval in Medicine, Lifelogging, and Internet.-
dc.description.abstractIn this paper, we describe our contribution to the 2020 ImageCLEF Medical Domain Visual Question Answering (VQA-Med) challenge. Our submissions scored first place on the VQA challenge leaderboard, and also the first place on the associated Visual Question Generation (VQG) challenge leaderboard. Our VQA approach was developed using a knowledge inference methodology called Skeleton-based Sentence Mapping (SSM). Using all the questions and answers, we derived a set of classifiable tasks and inferred the corresponding labels. As a result, we were able to transform the VQA task into a multi-task image classification problem which allowed us to focus on the image modelling aspect. We further propose a class-wise and task-wise normalization facilitating optimization of multiple tasks in a single network. This enabled us to apply a multi-scale and multi-architecture ensemble strategy for robust prediction. Lastly, we positioned the VQG task as a transfer learning problem using the VGA task trained models. The VQG task was also solved using classification.-
dc.description.statementofresponsibilityZhibin Liao, Qi Wu, Chunhua Shen, Anton van den Hengel, and Johan Verjans-
dc.language.isoen-
dc.publisherCEUR-WS-
dc.relation.ispartofseriesCEUR Workshop Proceedings; 2696-
dc.rightsCopyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).-
dc.source.urihttp://ceur-ws.org/Vol-2696-
dc.subjectVisual Question Answering; Visual Question Generation; Knowledge Inference; Deep Neural Networks; Skeleton-based Sentence Mapping; Class-wise and Task-wise Normalization-
dc.titleAIML at VQA-Med 2020: Knowledge inference via a skeleton-based sentence mapping approach for medical domain visual question answering-
dc.typeConference paper-
dc.contributor.conferenceInternational Conference of the CLEF Initiative (CLEF) (22 Sep 2020 - 25 Sep 2020 : virtual online)-
dc.publisher.placeonline-
pubs.publication-statusPublished-
dc.identifier.orcidLiao, Z. [0000-0001-9965-4511]-
dc.identifier.orcidWu, Q. [0000-0003-3631-256X]-
dc.identifier.orcidVan Den Hengel, A. [0000-0003-3027-8364]-
dc.identifier.orcidVerjans, J. [0000-0002-8336-6774]-
Appears in Collections:Australian Institute for Machine Learning publications

Files in This Item:
File Description SizeFormat 
hdl_132209.pdfPublished version501.4 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.