About ten years ago, HMAX was proposed as a simple and biologically feasible model for object recognition, based on how the visual cortex processes information. However, the model does not encompass sparse firing, which is a hallmark of neurons at all stages of the visual pathway. The current paper presents an improved model, called sparse HMAX, which integrates sparse firing. This model is able to learn higher-level features of objects on unlabeled training images. Unlike most other deep learning models that explicitly address global structure of images in every layer, sparse HMAX addresses local to global structure gradually along the hierarchy by applying patch-based learning to the output of the previous layer. As a consequence, the learning method can be standard sparse coding (SSC) or independent component analysis (ICA), two techniques deeply rooted in neuroscience. What makes SSC and ICA applicable at higher levels is the introduction of linear higher-order statistical regularities by max pooling. After training, high-level units display sparse, invariant selectivity for particular individuals or for image categories like those observed in human inferior temporal cortex (ITC) and medial temporal lobe (MTL). Finally, on an image classification benchmark, sparse HMAX outperforms the original HMAX by a large margin, suggesting its great potential for computer vision.
References
[1]
Ito M, Komatsu H (2004) Representation of angles embedded within contour stimuli in area V2 of macaque monkeys. The Journal of Neuroscience 24: 3313–3324.
[2]
Pasupathy A, Connor CE (2002) Population coding of shape in area V4. Nature Neuroscience 5: 1332–1338.
[3]
Desimone R, Albright TD, Gross CG, Bruce C (1984) Stimulus-selective properties of inferior temporal neurons in the macaque. Journal of Neuroscience 4: 2051–2062.
[4]
Fukushima K (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36: 193–202.
[5]
Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nature Neuroscience 2: 1019–1025.
[6]
Hubel DH, Wiesel TN (1968) Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology 195: 215–243.
[7]
Cadieu C, Kouh M, Pasupathy A, Connor CE, Riesenhuber M, et al. (2007) A model of V4 shape selectivity and invariance. Journal of Neurophysiology 98: 1733–1750.
[8]
Serre T, Oliva A, Poggio T (2007) A feedforward architecture accounts for rapid categorization. Proc Natl Acad Sci USA 104: 6424–6429.
[9]
Mutch J, Lowe DG (2006) Multiclass object recognition with sparse, localized features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[10]
Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381: 607–609.
[11]
Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research 37: 3311–3325.
[12]
Baddeley R, Abbott LF, Booth MCA, Sengpiel F, Freeman T, et al. (1997) Responses of neurons in primary and inferior temporal visual cortices to natural scenes. Proceedings of the Royal Society B - Biological Sciences 264: 1775–1783.
[13]
Carlson ET, Rasquinha RJ, Zhang K, Connor CE (2011) A sparse object coding scheme in area V4. Current Biology 21: 288–293.
[14]
Quian Quiroga R, Reddy L, Kreiman G, Koch C, Fried I (2005) Invariant visual representation by single neurons in the human brain. Nature 435: 1102–1107.
[15]
Waydo S, Koch C (2008) Unsupervised learning of individuals and categories from images. Neural Computation 20: 1165–1178.
[16]
Serre T, Wolf L, Poggio T (2005) Object recognition with features inspired by visual cortex. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 994–1000.
[17]
Serre T, Wolf T, Bileschi T, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29: 411–426.
[18]
Dayan P, Abbott LF (2001) Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. The MIT Press.
[19]
Hyv?rinen A, Hurri J, Hoyer PO (2009) Natural Image Statistics. Springer-Verlag.
[20]
Hyv?rinen A, Oja E (2000) Independent component analysis: Algorithms and applications. Neural Networks 13: 411–430.
[21]
Willmore BDB, Mazer JA, Gallant JL (2011) Sparse coding in striate and extrastriate visual cortex. Journal of Neurophysiology 105: 2907–2919.
[22]
Barth AL, Poulet JF (2012) Experimental evidence for sparse firing in the neocortex. Trends in Neurosciences 35: 345–355.
[23]
Schwartz O, Simoncelli EP (2001) Natural signal statistics and sensory gain control. Nature Neuroscience 4: 819–825.
[24]
Hyv?rinen A, Hoyer P (2000) Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Computation 12: 1705–1720.
[25]
Karklin Y, Lewicki MS (2005) A hierarchical bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. Neural Computation 17: 397–423.
[26]
Hyv?rinen A, Gutmann M, Hoyer PO (2005) Statistical model of natural stimuli predicts edge-like pooling of spatial frequency channels in v2. BMC Neuroscience 6..
[27]
Zeiler MD, Krishnan D, Taylor GW, Fergus R (2010) Deconvolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[28]
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[29]
Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th International Conference on Machine Learning. Montreal, Canada, pp. 609–616.
[30]
Lee H, Battle A, Raina R, Ng AY (2006) Efficient sparse coding algorithms. In: SchlL?lkopf B, Platt J, Hoffman T, editors, Advances in Neural Information Processing Systems 19 . pp. 801–808.
[31]
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60: 91–110.
[32]
Le QV, Ranzato M, Monga R, Devin M, Chen K, et al.. (2012) Building high-level features using large scale unsupervised learning. In: Proceedings of The 29th International Conference on Machine Learning. Edinburgh, Scotland, GB, pp. 81–88.
[33]
Yu K, Lin Y, Lafferty J (2011) Learning image representations from the pixel level via hierarchical sparse coding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[34]
Zou W, Ng A, Zhu S, Yu K (2012) Deep learning of invariant features via simulated fixations in video. In: Bartlett P, Pereira F, Burges C, Bottou L, Weinberger K, editors, Advances in Neural Information Processing Systems 25 . pp. 3212–3220.
[35]
Kanan C, Cottrell G (2010) Robust classification of objects, faces, and flowers using natural image statistics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2472–2479.
[36]
Shan H, Zhang L, Cottrell GW (2006) Recursive ICA. In: Sch?lkopf B, Platt J, Hoffman T, editors, Advances in Neural Information Processing Systems 19 . pp.1273–1280.
[37]
Gutmann MU, Hyv?rinen A (2013) A three-layer model of natural image statistics. In press.
[38]
Kavukcuoglu K, Sermanet P, Boureau YL, Gregor K, Mathieu M, et al.. (2010) Learning convolutional feature hierarchies for visual recognition. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel R, Culotta A, editors, Advances in Neural Information Processing Systems 23 . pp. 1090–1098.
[39]
Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of the International Conference on Computer Vision.
[40]
Coates A, Karpathy A, Ng A (2012) Emergence of object-selective features in unsupervised feature learning. In: Bartlett P, Pereira F, Burges C, Bottou L, Weinberger K, editors, Advances in Neural Information Processing Systems 25 . pp. 2690–2698.
[41]
Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26 th International Conference on Machine Learning. Montreal, Canada.
[42]
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Computation 14: 1771–1800.
[43]
Ranzato M, Poultney C, Chopra S, LeCun Y (2007) Efficient learning of sparse representations with an energy-based model. In: Sch?lkopf B, Platt J, Hoffman T, editors, Advances in Neural Information Processing Systems 19 . Cambridge, MA: MIT Press, pp. 1137–1144.
[44]
Hyv?rinen A, Oja E (1998) Independent component analysis by general nonlinear Hebbian-like learning rules. Signal Processing 64: 301–313.
[45]
Knoblich U, Bouvrie J, Poggio T (2007) Biophysical models of neural computation: Max and tuning circuits. Tech Rep CBCL Paper, Cambridge, MA, MIT.
[46]
Kouh M, Poggio T (2008) A canonical neural circuit for cortical nonlinear operations. Neural Comput 20: 1427–1451.