Variational bayesian dropout with a hierarchical prior

Liu, Y.; Dong, W.; Zhang, L.; Gong, D.; Shi, Q.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/127247

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Conference paper
Title:	Variational bayesian dropout with a hierarchical prior
Author:	Liu, Y. Dong, W. Zhang, L. Gong, D. Shi, Q.
Citation:	Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019, vol.2019-June, pp.7117-7126
Publisher:	IEEE
Publisher Place:	online
Issue Date:	2019
Series/Report no.:	IEEE Conference on Computer Vision and Pattern Recognition
ISBN:	9781728132938
ISSN:	1063-6919
Conference Name:	IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (15 Jun 2019 - 20 Jun 2019 : Long Beach, USA)
Statement of Responsibility:	Yuhang Liu, Wenyong Dong, Lei Zhang, Dong Gong, and Qinfeng Shi
Abstract:	Variational dropout (VD) is a generalization of Gaussian dropout, which aims at inferring the posterior of network weights based on a log-uniform prior on them to learn these weights as well as dropout rate simultaneously. The log-uniform prior not only interprets the regularization capacity of Gaussian dropout in network training, but also underpins the inference of such posterior. However, the log-uniform prior is an improper prior (i.e., its integral is infinite), which causes the inference of posterior to be ill-posed, thus restricting the regularization performance of VD. To address this problem, we present a new generalization of Gaussian dropout, termed variational Bayesian dropout (VBD), which turns to exploit a hierarchical prior on the network weights and infer a new joint posterior. Specifically, we implement the hierarchical prior as a zero-mean Gaussian distribution with variance sampled from a uniform hyper-prior. Then, we incorporate such a prior into inferring the joint posterior over network weights and the variance in the hierarchical prior, with which both the network training and dropout rate estimation can be cast into a joint optimization problem. More importantly, the hierarchical prior is a proper prior which enables the inference of posterior to be well-posed. In addition, we further show that the proposed VBD can be seamlessly applied to network compression. Experiments on classification and network compression demonstrate the superior performance of the proposed VBD in regularizing network training.
Rights:	©2019 IEEE
DOI:	10.1109/CVPR.2019.00729
Grant ID:	http://purl.org/au-research/grants/arc/DP140102270 http://purl.org/au-research/grants/arc/DP160100703
Published version:	http://dx.doi.org/10.1109/cvpr.2019.00729
Appears in Collections:	Aurora harvest 4 Computer Science publications

Files in This Item:

There are no files associated with this item.

Show full item record

Adelaide Research & Scholarship