Salakhutdinov Ruslan, Hinton Georey. Replicated softmax: an undirected topic model[J].In Advances in Neural Information Processing Systems, 2009, 22:1607-1614.
[2]
Bengio Yoshua, Ducharme Réjean, Vincent Pascal, et al. A neural probabilistic language model[J].Journal of Machine Learning Research, 2003 (3): 1137-1155.
[3]
Collobert Ronan, Weston Jason. Natural language processing (almost) from scratch[J].Journal of Machine Learning Research, 2000 (1): 1-48.
[4]
Mnih Andriy, Hinton Geoffrey.Three new graphical models for statistical language modelling[C]∥International Conference on Machine Learning,Oregan,USA,2007:641-648.
[5]
Mnih Andriy, Hinton Geoffrey.A scalable hierarchical distributed language model[C]∥Conference on Neural Information Processing Systems,Canada,2008:1081-1088.
[6]
Srivastava N, Salakhutdinov R R, Hinton G E. Modeling documents with a deep boltzmann machine in uncertainty in artificial intelligence[C]∥The Conference on Uncertainty in Artificial Intelligence,Bellevue,Washiugton,USA,2013:1309-1318.
[7]
Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model[C]∥Interspeech,Makuhari,Japan,2010:1045-1048.
[8]
Hinton G. A practical guide to training restricted Boltzmann machines[J]. Momentum, 2010, 9(1): 926.
[9]
Bengio Y. Learning deep architectures for AI[J]. Foundations and Trends in Machine Learning, 2009, 2(1): 1-127.
[10]
Hinton G E,Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J].Science, 2006, 313(5786): 504-507.
[11]
Blei David M, Ng Andrew, Jordan Michael. Latent dirichlet allocation[J]. JMLR, 2003,3:993-1022.
[12]
Blei David M, Griffths Thomas L, Jordan Michael I. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies[J]. ACM, 2010,57(2):7-38.
[13]
Mimno D, McCallum A. Topic models conditioned on arbitrary features with dirichlet-multinomial regression[J].UAI, 2008:411-418.
[14]
Huang Eric H, Socher Richard, Manning Christopher D. Improving word representations via global context and multiple word prototypes[C]∥Association for Computational Linguistics,Stroudsburg,PA,USA,2012:873-882.
[15]
Xue N. Chinese word segmentation as character tagging[J]. Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29-48.
[16]
Tang B, Wang X, Wang X. Chinese word segmentation based on large margin nethods[J]. Int J of Asian Lang Proc, 2009, 19(2): 55-68.
[17]
Zhao H, Kit C. Integrating unsupervised and supervised word segmentation: the role of goodness measures[J]. Information Sciences, 2011, 181(1): 163-183.
[18]
Bengio Y, Lamblin P, Popovici D, et al. Greedy layer-wise training of deep networks[J]. Advances in Neural Information Processing Systems, 2007, 19: 153.
[19]
Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks[C]∥Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2013: 6645-6649.
[20]
Tieleman T. Training restricted Boltzmann machines using approximations to the likelihood gradient[C]∥Proceedings of the 25th International Conference on Machine Learning,ACM, 2008: 1064-1071.
[21]
Trurian J, Ratinov L,Bengio Y.Word representations:a simple and general method for semi-supervised learning[C]∥Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,Stroudsburg,PA,USA,2010:384-394.