In this paper, we present a quantitative, highly structured cortex-simulated model, which can be simply described as feedforward, hierarchical simulation of ventral stream of visual cortex using biologically plausible, computationally convenient spiking neural network system. The motivation comes directly from recent pioneering works on detailed functional decomposition analysis of the feedforward pathway of the ventral stream of visual cortex and developments on artificial spiking neural networks (SNNs). By combining the logical structure of the cortical hierarchy and computing power of the spiking neuron model, a practical framework has been presented. As a proof of principle, we demonstrate our system on several facial expression recognition tasks. The proposed cortical-like feedforward hierarchy framework has the merit of capability of dealing with complicated pattern recognition problems, suggesting that, by combining the cognitive models with modern neurocomputational approaches, the neurosystematic approach to the study of cortex-like mechanism has the potential to extend our knowledge of brain mechanisms underlying the cognitive analysis and to advance theoretical models of how we recognize face or, more specifically, perceive other people’s facial expression in a rich, dynamic, and complex environment, providing a new starting point for improved models of visual cortex-like mechanism. 1. Introduction Understanding how rapid exposure to visual stimuli (face, objects) affects categorical decision by cortical neuron networks is essential for understanding the relationship between implicit neural information encoding and explicit behavior analysis. Quantitative psychophysical and physiological experimental evidences support the theory that the visual information processing in cortex can be modeled as a hierarchy of increasingly sophisticated, sparsely coded representations, along the visual pathway [1], and that the encoding using pulses, as a basic means of information transfer, is optimal in terms of information transmission. Such a spiking hierarchy should have the unique ability of decorrelating the incoming visual signals, removing the redundant information, while preserving invariability, in an effort to maximize the information gain [2]. Therefore, characterizing and modeling the functions along the hierarchy, from early or intermediate stages such as lateral geniculate nucleus (LGN), or prime visual cortex (V1), are necessary steps for systematic studies for higher level, more comprehensive tasks such as object recognition. However, the
References
[1]
S. B. Laughlin and T. J. Sejnowski, “Communication in neuronal networks,” Science, vol. 301, no. 5641, pp. 1870–1874, 2003.
[2]
H. Barlow, “Possible principles underlying the transformation of sensory messages,” Sensory Communication, pp. 217–2234, 1961.
[3]
K. Fukushima and S. Miyake, Neocognitron, a Self-Organizing Neural Network Model for A Mechanism of Visual Pattern Recognition, Lecture Notes in Biomathematics, Springer, 1982.
[4]
Y. LeCun and Y. Bengio:, Convolutional Networks for Images, Speech, and Time-Series. The Handbook of Brain theory and Neural Networks, MIT Press, 1995.
[5]
Y. LeCun and Y. Bengio, Pattern Recognition and Neural Networks. The Handbook of Brain theory and Neural Networks, MIT Press, 1995.
[6]
S. Ullman and S. Soloviev, “Computation of pattern invariance in brain-like structures,” Neural Networks, vol. 12, no. 7-8, pp. 1021–1036, 1999.
[7]
S. Ullman, M. Vidal-Naquet, and E. Sali, “Visual features of intermediate complexity and their use in classification,” Nature Neuroscience, vol. 5, no. 7, pp. 682–687, 2002.
[8]
H. Wersing and E. K?rner, “Learning optimized features for hierarchical models of invariant object recognition,” Neural Computation, vol. 15, no. 7, pp. 1559–1588, 2003.
[9]
M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience, vol. 2, no. 11, pp. 1019–1025, 1999.
[10]
T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust object recognition with cortex-like mechanisms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411–426, 2007.
[11]
T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, and T. Poggio, Theory of Object Recognition: computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex, AI Memo 2005-036/CBCL Memo 259, MIT Press, Cambridge, Mass, USA.
[12]
A. L. HODGKIN and A. F. HUXLEY, “A quantitative description of membrane current and its application to conduction and excitation in nerve,” The Journal of Physiology, vol. 117, no. 4, pp. 500–544, 1952.
[13]
W. Gerstern and W. M. Kistler, Spiking Neuron Models, Cambridge University Press, 2002.
[14]
E. M. Izhikevich, “Simple model of spiking neurons,” IEEE Transactions on Neural Networks, vol. 14, no. 6, pp. 1569–1572, 2003.
[15]
S. Thorpe, D. Fize, and C. Marlot, “Speed of processing in the human visual system,” Nature, vol. 381, no. 6582, pp. 520–522, 1996.
[16]
A. Delorme, J. Gautrais, R. Van Rullen, and S. Thorpe, “SpikeNET: a simulator for modeling large networks of integrate and fire neurons,” Neurocomputing, vol. 26-27, pp. 989–996, 1999.
[17]
S. G. Wysoski, L. Benuskova, and N. Kasabov, “Fast and adaptive network of spiking neurons for multi-view visual pattern recognition,” Neurocomputing, vol. 71, no. 13-15, pp. 2563–2575, 2008.
[18]
S. G. Wysoski, L. Benuskova, and N. Kasabov, “Evolving spiking neural networks for audiovisual information processing,” Neural Networks, vol. 23, no. 7, pp. 819–835, 2010.
[19]
R. J. Dolan, “Neuroscience and psychology: emotion, cognition, and behavior,” Science, vol. 298, no. 5596, pp. 1191–1194, 2002.
[20]
M. N. Dailey, G. W. Cottrell, C. Padgett, and R. Adolphs, “Empath: a neural network that categorizes facial expressions,” Journal of Cognitive Neuroscience, vol. 14, no. 8, pp. 1158–1173, 2002.
[21]
M. N. Dailey, C. Joyce, M. J. Lyons et al., “Evidence and a computational explanation of cultural differences in facial expression recognition,” Emotion, vol. 10, no. 6, pp. 874–893, 2010.
[22]
S.-Y. Fu, G.-S. Yang, and Z.-G. Hou, “Spiking neural networks based cortex like mechanism: a case study for facial expression recognition,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '11), pp. 1637–1642, 2011.
[23]
L. Zhaoping, “Theoretical understanding of the early visual processes by data compression and data selection,” Network: Computation in Neural Systems, vol. 17, no. 4, pp. 301–334, 2006.
[24]
T. Serre, Learning a dictionary of shape-components in visual cortex: comparison with neurons, humans and machines [Ph.D. thesis], MIT Press, 2006.
[25]
A. Hyv?rinen, P. O. Hoyer, and M. Inki, “Topographic independent component analysis,” Neural Computation, vol. 13, no. 7, pp. 1527–1558, 2001.
[26]
B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: a strategy employed by V1?” Vision Research, vol. 37, no. 23, pp. 3311–3325, 1997.
[27]
W. E. Vinje and J. L. Gallant, “Sparse coding and decorrelation in primary visual cortex during natural vision,” Science, vol. 287, no. 5456, pp. 1273–1276, 2000.
[28]
T. Poggio and T. Serre, Models of Visual Cortex, Scholarpedia, 2011.
[29]
R. VanRullen and S. J. Thorpe, “Surfing a spike wave down the ventral stream,” Vision Research, vol. 42, no. 23, pp. 2593–2615, 2002.
[30]
C. G. Gross, Brain Vision and Memory: Tales in the History of Neuroscience, MIT Press, 1998.
[31]
W. Zheng, X. Zhou, C. Zou, and L. Zhao, “Facial expression recognition using kernel canonical correlation analysis (KCCA),” IEEE Transactions on Neural Networks, vol. 17, no. 1, pp. 233–238, 2006.
[32]
A. J. Bell and T. J. Sejnowski, “The 'independent components' of natural scenes are edge filters,” Vision Research, vol. 37, no. 23, pp. 3327–3338, 1997.
[33]
JAFEE dataset, http://www.kasrl.org/jaffe.html.
[34]
F. Y. Shih, C. F. Chuang, and P. S. P. Wang, “Performance comparisons of facial expression recognition in JAFFE database,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 22, no. 3, pp. 445–459, 2008.
[35]
H. B. Deng, L. W. Jin, L. X. Zhen, and J. C. Huang, “A new facial expression recognition method based on local gabor filter bank and PCA plus LDA,” International Journal of Information Technology, vol. 11, no. 11, pp. 86–96, 2005.
[36]
S. Y. Fu, G. S. Yang, and Z. G. Hou, “Multiple kernel learning with ICA: local discriminative image descriptors for recognition,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '10), July 2010.
[37]
Y. Tian, T. Kanade, and J. Cohn, “Evaluation of gabor wavelet based facial action unit recognition in image sequences of inceasing complexity,” in Proceedings of the International Conference on Multi-Modal Interface, 2002.
[38]
F. Cheng, J. Yu, and H. Xiong, “Facial expression recognition in JAFFE dataset based on Gaussian process classification,” IEEE Transactions on Neural Networks, vol. 21, no. 10, pp. 1685–1690, 2010.
[39]
N. T. Alves, J. A. Aznar-Casanova, and S. S. Fukusima, “Patterns of brain asymmetry in the perception of positive and negative facial expressions,” Laterality, vol. 14, no. 3, pp. 256–272, 2009.
R. E. Jack, C. Blais, C. Scheepers, P. G. Schyns, and R. Caldara, “Cultural confusions show that facial expressions are not universal,” Current Biology, vol. 19, no. 18, pp. 1543–1548, 2009.
[42]
R. E. Jack, R. Caldara, and P. G. Schyns, “Internal representations reveal cultural diversity in expectations of facial expressions of emotion,” Journal of Experimental Psychology, vol. 141, no. 1, pp. 19–25, 2012.
[43]
B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, pp. 607–609, 1996.
[44]
T. O. Sharpee, H. Sugihara, A. V. Kurgansky, S. P. Rebrik, M. P. Stryker, and K. D. Miller, “Adaptive filtering enhances information transmission in visual cortex,” Nature, vol. 439, no. 7079, pp. 936–942, 2006.
[45]
C. E. Connor, “A new viewpoint on faces,” Science, vol. 330, no. 6005, pp. 764–765, 2010.
[46]
W. A. Freiwald and D. Y. Tsao, “Functional compartmentalization and viewpoint generalization within the macaque face-processing system,” Science, vol. 330, no. 6005, pp. 845–851, 2010.
[47]
W. A. Freiwald, D. Y. Tsao, and M. S. Livingstone, “A face feature space in the macaque temporal lobe,” Nature Neuroscience, vol. 12, no. 9, pp. 1187–1196, 2009.
[48]
N. Kasabov, “To spike or not to spike: a probabilistic spiking neuron model,” Neural Networks, vol. 23, no. 1, pp. 16–19, 2010.
[49]
A. V. M. Herz, T. Gollisch, C. K. Machens, and D. Jaeger, “Modeling single-neuron dynamics and computations: a balance of detail and abstraction,” Science, vol. 314, no. 5796, pp. 80–85, 2006.
[50]
W. L. Braje, D. Kersten, M. J. Tarr, and N. F. Troje, “Illumination effects in face recognition,” Psychobiology, vol. 26, no. 4, pp. 371–380, 1998.
[51]
Y. Yamane, E. T. Carlson, K. C. Bowman, Z. Wang, and C. E. Connor, “A neural code for three-dimensional object shape in macaque inferotemporal cortex,” Nature Neuroscience, vol. 11, no. 11, pp. 1352–1360, 2008.
[52]
Z. U. Rahman, D. J. Jobson, and G. A. Woodell, “Multi-scale retinex for color image enhancement,” in Proceedings of the 1996 IEEE International Conference on Image Processing (ICIP '96), pp. 1003–1006, September 1996.
[53]
T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and expression database,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1615–1618, 2003.