Watch, reason and code: Learning to represent videos using program

Duan, X.; Wu, Q.; Gan, C.; Zhang, Y.; Huang, W.; Van Den Hengel, A.; Zhu, W.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/129989

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Duan, X.	-
dc.contributor.author	Wu, Q.	-
dc.contributor.author	Gan, C.	-
dc.contributor.author	Zhang, Y.	-
dc.contributor.author	Huang, W.	-
dc.contributor.author	Van Den Hengel, A.	-
dc.contributor.author	Zhu, W.	-
dc.date.issued	2019	-
dc.identifier.citation	Proceedings of the 27th ACM International Conference on Multimedia (ACM Multimedia 2019), MM '19, 2019, pp.1543-1551	-
dc.identifier.isbn	1450368891	-
dc.identifier.isbn	9781450368896	-
dc.identifier.uri	http://hdl.handle.net/2440/129989	-
dc.description.abstract	Humans have a surprising capacity to induce general rules that describe the specific actions portrayed in a video sequence. The rules learned through this kind of process allow us to achieve similar goals to those shown in the video but in more general circumstances. Enabling an agent to achieve the same capacity represents a significant challenge. In this paper, we propose a Watch-Reason-Code (WRC) model to synthesise programs that describe the process carried out in a set of video sequences. The ‘watch’ stage is simply a video encoder that encodes videos to multiple feature vectors. The ‘reason’ stage takes as input the features from multiple diverse videos and generates a compact feature representation via a novel deviation-pooling method. The ‘code’ stage is a multi-round decoder that the first step leverages to generate a draft program layout with possible useful statements and perceptions. Further steps then take these outputs and generate a fully structured, compile-able and executable program. We evaluate the effectiveness of our model in two video-to-program synthesis environments, Karel and ViZdoom, showing that we can achieve the state-of-the-art under a variety of settings.	-
dc.description.statementofresponsibility	Xuguang Duan, Qi Wu, Chuang Gan, Yiwei Zhang, Wenbing Huang, Anton Van Den Hengel, Wenwu Zhu	-
dc.language.iso	en	-
dc.publisher	Association for Computing Machinery	-
dc.rights	© 2019 Association for Computing Machinery.	-
dc.source.uri	https://dl.acm.org/doi/proceedings/10.1145/3343031	-
dc.subject	video understanding; video embedding; video to program translation	-
dc.title	Watch, reason and code: Learning to represent videos using program	-
dc.type	Conference paper	-
dc.contributor.conference	27th ACM International Conference on Multimedia (ACM Multimedia) (21 Oct 2019 - 25 Oct 2019 : Nice, France)	-
dc.identifier.doi	10.1145/3343031.3351094	-
dc.publisher.place	online	-
dc.relation.grant	http://purl.org/au-research/grants/arc/DE190100539	-
pubs.publication-status	Published	-
dc.identifier.orcid	Wu, Q. [0000-0003-3631-256X]	-
dc.identifier.orcid	Van Den Hengel, A. [0000-0003-3027-8364]	-
Appears in Collections:	Aurora harvest 4 Computer Science publications

Files in This Item:

File	Description	Size	Format
hdl_129989.pdf	Accepted version	1.71 MB	Adobe PDF	View/Open

Show simple item record

Adelaide Research & Scholarship