OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

ISRN Machine Vision 2013

Active Object Recognition with a Space-Variant Retina

DOI: 10.1155/2013/138057

Christopher Kanan

Full-Text Cite this paper Add to My Lib

Abstract:

When independent component analysis (ICA) is applied to color natural images, the representation it learns has spatiochromatic properties similar to the responses of neurons in primary visual cortex. Existing models of ICA have only been applied to pixel patches. This does not take into account the space-variant nature of human vision. To address this, we use the space-variant log-polar transformation to acquire samples from color natural images, and then we apply ICA to the acquired samples. We analyze the spatiochromatic properties of the learned ICA filters. Qualitatively, the model matches the receptive field properties of neurons in primary visual cortex, including exhibiting the same opponent-color structure and a higher density of receptive fields in the foveal region compared to the periphery. We also adopt the “self-taught learning” paradigm from machine learning to assess the model’s efficacy at active object and face classification, and the model is competitive with the best approaches in computer vision. 1. Introduction In humans and other simian primates, central foveal vision has an exceedingly high spatial resolution (acuity) compared to the periphery. This space-variant scheme enables a large field of view, while allowing visual processing to be efficient. The human retina contains about six million cone photoreceptors but sends only about one million axons to the brain [1]. By employing a space variant representation, the retina is able to greatly reduce the dimensionality of the visual input, with eye movements allowing fine details to be resolved if necessary. The retina’s space-variant representation is reflected in early visual cortex’s retinotopic map. About half of primary visual cortex (V1) is devoted solely to processing the central 15 degrees of visual angle [2, 3]. This enormous overrepresentation of the fovea in V1 is known as cortical magnification [4]. Neurons in V1 have localized an orientation sensitive receptive fields (RFs). V1-like RFs can be algorithmically learned using independent component analysis (ICA) [5–8]. ICA finds a linear transformation that makes the outputs as statistically independent as possible [5], and when ICA is applied to achromatic natural image patches, it produces basis functions that have properties similar to neurons in V1. Moreover, when ICA is applied to color image patches, it produces RFs with V1-like opponent-color characteristics, with the majority of the RFs exhibiting either dark-light opponency, blue-yellow opponency, or red-green opponency [6–8]. Filters learned from unlabeled

References

[1]	C. A. Curcio and K. A. Allen, “Topography of ganglion cells in human retina,” Journal of Comparative Neurology, vol. 300, no. 1, pp. 5–25, 1990.
[2]	R. F. Dougherty, V. M. Koch, A. A. Brewer, B. Fischer, J. Modersitzki, and B. A. Wandell, “Visual field representations and locations of visual areas v1/2/3 in human visual cortex,” Journal of Vision, vol. 3, no. 10, pp. 586–598, 2003.
[3]	S. A. Engel, G. H. Glover, and B. A. Wandell, “Retinotopic organization in human visual cortex and the spatial precision of functional MRI,” Cerebral Cortex, vol. 7, no. 2, pp. 181–192, 1997.
[4]	P. M. Daniel and D. Whitteridge, “The representation of the visual field on the cerebral cortex in monkeys,” The Journal of Physiology, vol. 159, pp. 203–221, 1961.
[5]	A. J. Bell and T. J. Sejnowski, “The “independent components” of natural scenes are edge filters,” Vision Research, vol. 37, no. 23, pp. 3327–3338, 1997.
[6]	M. S. Caywood, B. Willmore, and D. J. Tolhurst, “Independent components of color natural scenes resemble V1 neurons in their spatial and color tuning,” Journal of Neurophysiology, vol. 91, no. 6, pp. 2859–2873, 2004.
[7]	T. W. Lee, T. Wachtler, and T. J. Sejnowski, “Color opponency is an efficient representation of spectral properties in natural scenes,” Vision Research, vol. 42, no. 17, pp. 2095–2103, 2002.
[8]	T. Wachtler, E. Doi, T. W. Lee, and T. J. Sejnowski, “Cone selectivity derived from the responses of the retinal cone mosaic to natural scenes,” Journal of Vision, vol. 7, no. 8, article 6, 2007.
[9]	R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: transfer learning from unlabeled data,” in Proceedings of the 24th International Conference on Machine Learning (ICML '07), pp. 759–766, June 2007.
[10]	C. Kanan and G. Cottrell, “Robust classification of objects, faces, and flowers using natural image statistics,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 2472–2479, June 2010.
[11]	Q. V. Le, M. A. Ranzato, R. Monga et al., “Building high-level features using large scale unsupervised learning,” in Proceedings of the International Conference on Machine Learning (ICML '12), pp. 81–88, 2012.
[12]	H. Shan and G. W. Cottrell, “Looking around the backyard helps to recognize faces and digits,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), June 2008.
[13]	M. D. Fairchild, Color Appearance Models, Wiley Interscience, 2nd edition, 2005.
[14]	C. Kanan, A. Flores, and G. W. Cottrell, “Color constancy algorithms for object and face recognition,” in Advances in Visual Computing, vol. 6453 of Lecture Notes in Computer Science, no. 1, pp. 199–210, 2010.
[15]	C. Kanan and G. W. Cottrell, “Color-to-grayscale: does the method matter in image recognition?” PLoS ONE, vol. 7, no. 1, Article ID e29740, 2012.
[16]	M. Bolduc and M. D. Levine, “A real-time foveated sensor with overlapping receptive fields,” Real-Time Imaging, vol. 3, no. 3, pp. 195–212, 1997.
[17]	M. Bolduc and M. D. Levine, “A review of biologically motivated space-variant data reduction models for robotic vision,” Computer Vision and Image Understanding, vol. 69, no. 2, pp. 170–184, 1998.
[18]	E. L. Schwartz, “Spatial mapping in the primate sensory projection: analytic structure and relevance to perception,” Biological Cybernetics, vol. 25, no. 4, pp. 181–194, 1977.
[19]	M. Chessa, S. P. Sabatini, F. Solari, and F. Tatti, “A quantitative comparison of speed and reliability for log-polar mapping techniques,” in Computer Vision Systems, vol. 6962 of Lecture Notes in Computer Science, pp. 41–50, 2011.
[20]	R. H. Masland, “The fundamental plan of the retina,” Nature Neuroscience, vol. 4, no. 9, pp. 877–886, 2001.
[21]	A. Olmos and F. A. A. Kingdom, “A biologically inspired algorithm for the recovery of shading and reflectance images,” Perception, vol. 33, no. 12, pp. 1463–1473, 2004.
[22]	Z. Koldovsky, P. Tichavsky, and E. Oja, “Efficient variant of algorithm FastICA for independent component analysis attaining the Cramér-Rao lower bound,” IEEE Transactions on Neural Networks, vol. 17, no. 5, pp. 1265–1277, 2006.
[23]	J. P. Jones and L. A. Palmer, “An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex,” Journal of Neurophysiology, vol. 58, no. 6, pp. 1233–1258, 1987.
[24]	M. Grundland and N. A. Dodgson, “Decolorize: fast, contrast enhancing, color to grayscale conversion,” Pattern Recognition, vol. 40, no. 11, pp. 2891–2896, 2007.
[25]	R. Gattass, C. G. Gross, and J. H. Sandell, “Visual topography of V2 in the Macaque,” Journal of Comparative Neurology, vol. 201, no. 4, pp. 519–539, 1981.
[26]	C. Kanan, “Recognizing sights, smells, and sounds with gnostic fields,” PLoS ONE, vol. 8, no. 1, Article ID e54088, 2013.
[27]	J. Konorski, Integrative Activity of the Brain, University of Chicago Press, Chicago, Ill, USA, 1967.
[28]	M. Kouh and T. Poggio, “A canonical neural circuit for cortical nonlinear operations,” Neural Computation, vol. 20, no. 6, pp. 1427–1451, 2008.
[29]	I. S. Dhillon and D. S. Modha, “Concept decompositions for large sparse text data using clustering,” Machine Learning, vol. 42, no. 1-2, pp. 143–175, 2001.
[30]	R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, “LIBLINEAR: a library for large linear classification,” Journal of Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
[31]	K. Crammer and Y. Singer, “On the algorithmic implementation of multiclass kernel-based vector machines,” Journal of Machine Learning Research, vol. 2, pp. 265–292, 2001.
[32]	A. M. Martinez and R. Benavente, “The AR face database,” Tech. Rep. 24, CVC, 1998.
[33]	G. Griffin, A. D. Holub, and P. Perona, “The Caltech-256 object category dataset,” Tech. Rep. CNS-TR-2007-001, Caltech, Pasadena, Calif, USA, 2007.
[34]	Y. Liang, C. Li, W. Gong, and Y. Pan, “Uncorrelated linear discriminant analysis based on weighted pairwise Fisher criterion,” Pattern Recognition, vol. 40, no. 12, pp. 3606–3615, 2007.
[35]	N. Pinto, D. D. Cox, and J. J. DiCarlo, “Why is real-world visual object recognition hard?” PLoS Computational Biology, vol. 4, no. 1, article e27, 2008.
[36]	P. Gehler and S. Nowozin, “On feature combination for multiclass object classificationpages,” in Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV '09), pp. 221–228, IEEE Computer Society, Los Alamitos, Calif, USA, 2009.
[37]	A. Bergamo and L. Torresani, “Meta-class features for large-scale object categorization on a budget,” in Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR '12), 2012.
[38]	B. T. Vincent, R. J. Baddeley, T. Troscianko, and I. D. Gilchrist, “Is the early visual system optimised to be energy efficient?” Network: Computation in Neural Systems, vol. 16, no. 2-3, pp. 175–190, 2005.
[39]	V. Javier Traver and A. Bernardino, “A review of log-polar imaging for visual perception in robotics,” Robotics and Autonomous Systems, vol. 58, no. 4, pp. 378–398, 2010.
[40]	M. Varma and D. Ray, “Learning the discriminative power-invariance trade-off,” in Proceedings of the 2007 IEEE 11th International Conference on Computer Vision (ICCV '07), October 2007.
[41]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, Mass, USA, 1998.
[42]	H. Larochelle and G. Hinton, “Learning to combine foveal glimpses with a third-order Boltzmann machine,” in Proceedings of the 24th Annual Conference on Neural Information Processing Systems 2010 (NIPS '10), December 2010.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133