OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

中国图象图形学报 2014

深度学习及其在目标和行为识别中的新进展

DOI: 10.11834/jig.20140202

郑胤,陈权崎,章毓晋

Keywords: 深度学习|目标识别|行为识别|计算机视觉

Full-Text Cite this paper Add to My Lib

Abstract:

目的深度学习是机器学习中的一个新的研究领域。通过深度学习的方法构建深度网络来抽取特征是目前目标和行为识别中得到关注的研究方向。为引起更多计算机视觉领域研究者对深度学习进行探索和讨论，并推动目标和行为识别的研究，对深度学习及其在目标和行为识别中的新进展给予概述。方法首先介绍深度学习领域研究的基本状况、主要概念和原理；然后介绍近期利用深度学习在目标和行为识别应用中的一些新进展。结果阐述了深度学习与神经网络之间的关系，深度学习的优缺点，以及目前深度学习理论需要解决的主要问题。结论该文对拟将深度学习应用于目标和行为识别的研究人员有所帮助。

References

[1]	Zheng Y, Zhang Y J, Larochelle H. A supervised neural autoregressive topic model for simultaneous image classification and annotation: 1305.5306. New York, USA: Cornell University, 2013.
[2]	Larochelle H, Lauly S. A neural autoregressive topic model[C]//Advances in Neural Information Processing Systems. Nevada, United States: MIT Press, 2012: 2717-2725.
[3]	Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):221-231.
[4]	Baccouche M, Mamalet F, Wolf C, et al. Spatio-temporal convolutional sparse auto-encoder for sequence classification[J]. Networks, 2005, 18(5-6):602-610.
[5]	Taylor G W, Fergus R, LeCun Y, et al. Convolutional learning of spatio-temporal features[C]//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece:Springer, 2010, 6316:140-153.
[6]	Ranzato M A, Huang F J, Boureau Y L, et al. Unsupervised learning of invariant feature hierarchies with applications to object recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN:IEEE, 2007: 1-8.
[7]	Zeiler M D, Krishnan D, Taylor G W, et al. Deconvolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA:IEEE, 2010:2528-2535.
[8]	Ranzato M, Susskind J, Mnih V, et al. On deep generative models with applications to recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI:IEEE, 2011:2857-2864.
[9]	Kong B. Comparison between human vision and computer vision[J]. Nature Magazine, 2002, 24(1):51-55.[孔斌.人类视觉与计算机视觉的比较[J].自然杂志, 2002, 24(1):51-55.]
[10]	Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
[11]	Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of IEEE Computer Society Conference on in Computer Vision and Pattern Recognition. San Diego, CA, USA:IEEE, 2005, 1:886-893.
[12]	Ojala T, Pietikainen M, Harwood D. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions[C]//Proceedings of ICPR. Jerusalem, Irsael: IEEE, 1994, 1:582-585.
[13]	Matas J, Chum O, Urban M, et al. Robust wide-baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing, 2004, 22(10):761-767.
[14]	Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[15]	Hinton G E. Learning multiple layers of representation[J]. Trends In Cognitive Sciences, 2007, 11(10):428-434.
[16]	Hinton G E, Zemel R S. Autoencoders, minimum description length, and Helmholtz free energy[C]//Advances in Neural Information Processing Systems.Burlington, USA: Morgan Kaufmann, 1994:3-10.
[17]	Rumelhart D E, Hinton G E, Williams R J. Learning Representations by Back-Propagating Errors[M]. Cognitive Modeling: MIT Press, 2002, 1:213.
[18]	Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning. New York, NY, USA:ACM, 2008:1096-1103.
[19]	Lee H, Grosse R, Ranganath R, et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations[C]//Proceedings of the 26th Annual International Conference on Machine Learning. New York, NY, USA:ACM, 2009:609-616.
[20]	Krizhevsky A, Hinton G E. Factored 3-way restricted boltzmann machines for modeling natural images[C]//Proceedings of International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: 2010: 621-628.
[21]	Ranzato M A, Hinton G E. Modeling pixel means and covariances using factorized third-order boltzmann machines[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA:IEEE, 2010:2551-2558.
[22]	Lee T S, Mumford D, Romero R, et al. The role of the primary visual cortex in higher level vision[J]. Vision Research, 1998, 38(15-16):2429-2454.
[23]	Lee T S, Mumford D. Hierarchical Bayesian inference in the visual cortex[J]. JOSA A, 2003, 20(7):1434-1448.
[24]	Arel I, Rose D C, Karnowski T P. Deep machine learning a new frontier in artificial intelligence research[J]. IEEE Transactions on Computational Intelligence Magazine, 2010, 5(4):13-18.
[25]	Deng J, Berg A, Satheesh S, et. al. Large scale visual recognition challenge[EB/OL].[2013-11-14]. http://www.image-net.org/challenges/LSVRC/2012/index.
[26]	Everingham M, Visual object classes challenge[EB/OL].[2013-11-14]. http://pascallin.ecs.soton.ac.uk/challenges/VOC/.
[27]	Hinton G E, Sejnowski T J, Ackley D H. Boltzmann Machines:Constraint Satisfaction Networks that Learn[M]. Pittsburgh, PA:Carnegie-Mellon University, Department of Computer Science, 1984.
[28]	Andrieu C, de Freitas N, Doucet A, et al. An introduction to MCMC for machine learning[J]. Machine Learning, 2003, 50(1-2):5-43.
[29]	更多...
[30]	Hinton G E.Training products of experts by minimizing contrastive divergence[J]. Neural Computation, 2002, 14(8):1771-1800.
[31]	Bengio Y, Lamblin P, Popovici D, et al. Greedy layer-wise training of deep networks[C]//Advances in neural information processing systems.Vancouver, BC, Canada: MIT Press, 2007, 19:153.
[32]	Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[33]	Hinton G E, Welling M, Teh Y W, et al. A new view of ICA[C]//Proceedings of Int. Conf. on Independent Component Analysis and Blind Source Separation. San Diego, CA: 2001:746-751.
[34]	Olshausen B A, Field D J. Sparse coding with an overcomplete basis set:a strategy employed by VI?[J]. Vision Research, 1997, 37(23):3311-3326.
[35]	Rehn M, Sommer F T. A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields[J]. Journal of Computational Neuroscience, 2007, 22(2):135-146.
[36]	Lee H, Ekanadham C, Ng A. Sparse deep belief net model for visual area V2[C]//Proceedings of Advances In Neural Information Processing Systems, Cambridge, MA:MIT Press, 2008:873-880.
[37]	Krizhevsky A, Nair V, Hinton G. The CIFAR-10 Dataset[DB/OL].[2013-11-14].http:[C]//www.cs.toronto.edu/kriz/cifar.html.
[38]	Larochelle H, Murray I. The neural autoregressive distribution estimator[C]//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, AISTATS 2011. Fort Lauderdale, FL, United states: Microtome Publishing, 2011: 29-37.
[39]	Le Q V, Ramzato M, Monga R, et al. Building high-level features using large scale unsupervised learning, 1112.6209. New York, USA: Cornell University, 2012.
[40]	Wang C, Blei D, Li F F. Simultaneous image classification and annotation[C]//Proceedings of IEEE Conference on in Computer Vision and Pattern Recognition. Miami, FL:IEEE, 2009:1903-1910.
[41]	Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133