%0 Journal Article
%T Parallel Dither and Dropout for Regularising Deep Neural Networks
%A Andrew J. R. Simpson
%J Computer Science
%D 2015
%I arXiv
%X Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be used without batch averaging. Our results for parallel-regularised non-batch-SGD are substantially better than what is possible with batch-SGD. Furthermore, our results demonstrate that dither and dropout are complimentary.
%U http://arxiv.org/abs/1508.07130v1