This paper introduces a multimodal approach for reranking of image retrieval results based on relevance feedback. We consider the problem of reordering the ranked list of images returned by an image retrieval system, in such a way that relevant images to a query are moved to the first positions of the list. We propose a Markov random field (MRF) model that aims at classifying the images in the initial retrieval-result list as relevant or irrelevant; the output of the MRF is used to generate a new list of ranked images. The MRF takes into account (1) the rank information provided by the initial retrieval system, (2) similarities among images in the list, and (3) relevance feedback information. Hence, the problem of image reranking is reduced to that of minimizing an energy function that represents a trade-off between image relevance and interimage similarity. The proposed MRF is a multimodal as it can take advantage of both visual and textual information by which images are described with. We report experimental results in the IAPR TC12 collection using visual and textual features to represent images. Experimental results show that our method is able to improve the ranking provided by the base retrieval system. Also, the multimodal MRF outperforms unimodal (i.e., either text-based or image-based) MRFs that we have developed in previous work. Furthermore, the proposed MRF outperforms baseline multimodal methods that combine information from unimodal MRFs. 1. Introduction Images are the main source of information available after text; this fact is due to the availability of inexpensive image registration (e.g., photographic cameras and cell phones) and data storage devices (large volume hard drives), which have given rise to the existence of millions of digital images stored in many databases around the world. However, stored information is useless if we cannot access the specific data we are interested in. Thus, the development of effective methods for the organization and exploration of image collections is a crucial task [1–3]. In a standard image retrieval scenario one has available a collection of images and users want to access images stored in that collection, where images can be annotated (i.e., associated to a textual description). Images are represented by features extracted from them. Users formulate queries (which are associated to their information needs) by using either sample images, a textual description, or a combination of both. Queries are represented by features extracted from them and the retrieval process reduces to comparing the
References
[1]
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349–1380, 2000.
[2]
A. Goodrum, “Image information retrieval: an overview of current research,” Journal of Informing Science, vol. 3, no. 2, pp. 63–66, 2000.
[3]
R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: ideas, influences, and trends of the new age,” ACM Computing Surveys, vol. 40, no. 2, article 5, 2008.
[4]
Y. Liu, D. Zhang, G. Lu, and W. Ma, “A survey of content-based image retrieval with high-level semantics,” Pattern Recognition, vol. 40, no. 1, pp. 262–282, 2007.
[5]
M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-based multimedia information retrieval: state of the art and challenges,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 2, no. 1, pp. 1–19, 2006.
[6]
Y. Rui, T. Huang, and S. Chang, “Image retrieval: current techniques, promising directions and open issues,” Journal of Visual Communication and Image Representation, vol. 10, no. 4, pp. 39–62, 1999.
[7]
P. Clough, M. Grubinger, T. Deselaers, A. Hanbury, and H. Müller, “Overview of ImageCLEF 2006 Photographic retrieval and object annotation tasks,” in Proceedings of the 7th Workshop of the Cross-Language Evaluation Forum (CLEF '07), vol. 4730 of Lecture Notes in Computer Science, pp. 579–594, Springer, 2007.
[8]
P. K. Atry, M. A. Hossain, A. E. Saddik, and M. S. Kankanhalli, “Multimodal fusion for multimedia analysis,” Multimedia Systems, vol. 16, no. 6, pp. 345–379, 2010.
[9]
M. Broilo and F. G. B. De Natale, “A stochastic approach to image retrieval using relevance feedback and particle swarm optimization,” IEEE Transactions on Multimedia, vol. 12, no. 4, pp. 267–277, 2010.
[10]
Y. Rui, T. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: a power tool for interactive content-based image retrieval,” IEEE Transactions on Circuits and Systems For Video Technology, vol. 8, no. 5, pp. 644–655, 1998.
[11]
X. Zhou and T. Huang, “Relevance feedback in image retrieval: a comprehensive review,” Multimedia Systems, vol. 8, pp. 536–544, 2003.
[12]
T. Deselaers, T. Gass, P. Dreuw, and H. Ney, “Jointly optimising relevance and diversity in image retrieval,” in Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR '09), pp. 296–303, ACM Press, July 2009, paper 39.
[13]
H. J. Escalante, C. Hernandez, E. Sucar, and M. Montes, “Late fusion of heterogeneous methods for multimedia image retrieval,” in Proceedings of the ACM Multimedia Information Retrieval Conference, pp. 172–179, ACM Press, Vancouver, Canada, 2008.
[14]
A. Juàrez, M. Montes, L. Villase?or, D. Pinto, and M. Pérez, “Selecting the N-top retrieval result lists for an effective data fusion,” in Proceedings of the 11th International Conference on Intelligent Text Processing and Computational Linguistics, vol. 6008 of Lecture Notes in Computer Science, pp. 580–589, Springer, 2010.
[15]
Y. Chang, W. Lin, and H.-H. Chen, “Combining text and image queries at ImageCLEF 2005,” in Working Notes of the CLEF Workshop, Vienna, Austria, 2005.
[16]
A. Marakakis, N. Galatsanos, A. Likas, and A. Stafylopatis, “Application of relevance feedback in content based image retrieval using gaussian mixture models,” in Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '08), pp. 141–148, November 2008.
[17]
X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X. S. Hua, “Bayesian video search reranking,” in Proceedings of the 16th ACM International Conference on Multimedia, pp. 131–140, ACM Press, Vancouver, Canada, 2008.
[18]
Y. Jing and S. Baluja, “PageRank for product image search,” in Proceedings of the International World Wide Web Conference Committee, pp. 307–315, ACM Press, Beijing, China, 2008.
[19]
T. Yao, T. Mei, and C. W. Ngo, “Co-reranking by mutual reinforcement for image search,” in Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 34–41, ACM Press, Xian, China, 2010.
[20]
J. Cui, F. Wen, and X. Tang, “Real time google and live image search re-ranking,” in Proceedings of the ACM Multimedia Information Retrieval Conference, pp. 729–732, ACM Press, Vancouver, Canada, 2008.
[21]
W. Lin, R. Jin, and A. Hauptmann, “A web image retrieval re-ranking with relevance model,” in Proceedings of the IEEE International Conference on Web Intelligence, p. 242, 2003.
[22]
H. Müller, P. Clough, T. Deselaers, and B. Caputo, ImageCLEF: Experimental Evaluation in Visual Information Retrieval, Springer Series on Information Retrieval, 2010.
[23]
M. Grubinger, Analysis and evaluation of visual information systems performance [Ph.D. thesis], School of Computer Science and Mathematics, Faculty of Health, Engineering and Science, Victoria University, Melbourne, Australia, 2007.
[24]
P. Clough, M. Grubinger, T. Deselaers, A. Hanbury, and H. Müller, “Overview of the ImageCLEF 2007 photographic retrieval task,” in Proceedings of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF '08), vol. 5152 of Lecture Notes in Computer Science, pp. 433–444, Springer, 2008.
[25]
T. Arni, M. Sanderson, P. Clough, and M. Grubinger, “Overview of the ImageCLEF 2007 photographic retrieval task,” in Evaluating Systems for Multilingual and Multimodal Information Access, vol. 5706 of Lecture Notes in Computer Science, pp. 500–511, Springer, 2009.
[26]
R. O. Chàvez, M. Montes, and E. Sucar, “Using a markov random field for image re-ranking based on visual and textual features,” Computación y Sistemas, vol. 14, no. 4, pp. 393–404, 2011.
[27]
R. O. Chàvez, M. Montes, and E. Sucar, “Image Re-ranking based on relevance feedback combining internal and external similarities,” in Proceedings of the 23rd International FLAIRS Conference, pp. 140–141, Daytona Beach, Fla, USA, 2010.
[28]
I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, “The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 20–37, 2000.
[29]
C. Zhang, J. Y. Chai, and R. Jin, “User term feedback in interactive text-based image retrieval,” in Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 51–58, ACM Press, Salvador, Brazil, 2005.
[30]
Z. H. Zhou, K. E. J. Chen, and H. B. Dai, “Enhancing relevance feedback in image retrieval using unlabeled data,” ACM Transactions on Information Systems, vol. 24, no. 2, pp. 219–244, 2006.
[31]
S. Tong and E. Chang, “Support vector machine active learning for image retrieval,” in Proceedings of the ninth ACM international conference on Multimedia, pp. 107–118, ACM Press, Ottawa, Canada, 2001.
[32]
T. Deselaers, R. Paredes, E. Vidal, and H. Ney, “Learning weighted distances for relevance feedback in image retrieval,” in Proceedings of the 19th International Conference on Pattern Recognition (ICPR '08), pp. 1–4, Tampa, Fla, USA, December 2008.
[33]
R. Yan, A. G. Hauptmann, and R. Jin, “Negative pseudo-relevance feedback in content-based video retrieval,” in Proceedings of the 11th ACM International Conference on Multimedia, pp. 343–346, ACM Press, Berkeley, Calif, USA, 2003.
[34]
R. Yan, A. G. Hauptmann, and R. Jin, “Multimedia search with pseudo-relevance feedback,” in Proceedings of the International Conference on Image and Video Retrieval, ACM Press, Urbana, Ill, USA, 2003.
[35]
H. Ma, J. Zhu, M. R. Lyu, and I. King, “Bridging the semantic gap between image contents and tags,” IEEE Transactions on Multimedia, vol. 12, no. 5, pp. 462–473, 2010.
[36]
H. Tong, J. He, M. Li, W. Y. Ma, H. J. Zhang, and C. Zhang, “Manifoldranking-based keyword propagation for image retrieval,” EURASIP Journal on Applied Signal Processing, vol. 2006, Article ID 079412, 2006.
[37]
J. Ah-Pine, M. Bressan, S. Clinchant, G. Csurka, Y. Hoppenot, and J. M. Renders, “Crossing textual and visual content in different application scenarios,” Multimedia Tools and Applications, vol. 42, no. 1, pp. 31–56, 2009.
[38]
K. Porkaew and K. Chakrabarti, “Query refinement for multimedia similarity retrieval in MARS,” in Proceedings of the 7th ACM International Conference on Multimedia, pp. 235–238, ACM Press, 1999.
[39]
K. Porkaew, M. Ortega, and S. Mehrotra, “Query reformulation for content based multimedia retrieval in MARS,” in Proceedings of the 6th International Conference on Multimedia Computing and Systems (IEEE ICMCS '99), pp. 747–751, June 1999.
[40]
G. Giacinto and F. Roli, “Nearest-prototype relevance feedback for content-based image retrieval,” in Proceedings of the 17th International Conference on Patternt Recognition, vol. 2, pp. 989–992, Washington, DC, USA, 2004.
[41]
G. Giacinto and F. Roli, “Instance-based relevance feedback for image retrieval,” in Advances in Neural Information Processing Systems, vol. 17, pp. 489–496, MIT Press, 2005.
[42]
G. Giacinto and F. Roli, “Instance-based relevance feedback in image retrieval using dissimilarity spaces,” in Case-Based Reason-Ing for Signals and Images, pp. 419–430, Springer, 2007.
[43]
P. H. Gosselin and M. Cord, “Active learing techniques for user interactive systems: application to image retrieval,” in Proceedings of the Workshop Machine Learning Techniques for Processing Multimedia Content, Bonn, Germany, 2005.
[44]
L. Setia, J. Ick, and H. Burkhardt, “SVM-based relevance feedback in image retrieval using invariant feature histograms,” in Proceedings of the IAPR Workshop on Machine Vision Applications, Tsukuba Science City, Japan, 2005.
[45]
Y. Chen, X. Zhou, and T. Huang, “One-class SVM for learning in image retrieval,” in Proceedings of the International Conference on Image Processing, pp. 34–37, Thessaloniki, Greece, 2001.
[46]
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An efficient boosting algorithm for combining preferences,” Journal of Machine Learning Research, vol. 4, no. 6, pp. 933–969, 2004.
[47]
V. Lavrenko and W. B. Croft, “Relevance-based language models,” in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127, ACM Press, 2001.
[48]
G. Winkler, Image Analysis, Random Fields and Markov Chain Monte Carlo Methods, Springer Series on Applications of Mathematics, Springer, 2006.
[49]
S. Z. Li, Markov Random Field Modeling in Image Analysis, Springer, 2nd edition, 2001.
[50]
S. Z. Li, “Markov random field models in computer vision,” in Proceedings of the European Conference on Computer Vision, vol. 801 of Lecture Notes in Computer Science, pp. 361–370, Springer, Stockholm, Sweden, 1994.
[51]
K. Held, E. Kops, B. Krause, W. Wells III, R. Kikinis, and H. Mueller, “Markov random field segmentation of brain MR images,” IEEE Transactions on Medical Imaging, vol. 16, no. 6, pp. 878–886, 1997.
[52]
S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the bayesian restoration of images,” in Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, pp. 564–584, 1987.
[53]
P. Carbonetto, N. de Freitas, and K. Barnard, “A statistical model for general context object recognition,” in Proceedings of the 8th European Conference on Computer Vision, vol. 3021 of Lecture Notes in Computer Science, pp. 350–362, Springer, Prague, Czech Republic, 2004.
[54]
C. Hernandez and L. E. Sucar, “Markov random fields and spatial information to improve automatic image annotation,” in Proceedings of the Pacic-Rim Symposium on Image and Video Technology, vol. 4872 of Lecture Notes in Computer Science, pp. 879–892, Springer, Santiago, Chile, 2007.
[55]
H. J. Escalante, M. Montes, and L. E. Sucar, “Word Co-occurrence and markov random fields for improving automatic image annotation,” in Proceedings of the 18th British Machine Vision Conference, vol. 2, pp. 600–609, Warwick, UK, 2007.
[56]
D. Metzler and B. Croft, “A markov random field model for term dependencies,” in Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 472–479, ACM Press, 2005.
[57]
D. Metzler and W. B. Croft, “Latent concept expansion using Markov random fields,” in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '07), pp. 311–318, ACM Press, July 2007.
[58]
M. Lease, “An improved markov random field model for supporting verbose queries,” in Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '09), pp. 476–483, ACM Press, July 2009.
[59]
J. Besag, “On the statistical analysis of dirty pictures,” Jounal of the Royal Statistical Society B, vol. 48, pp. 259–302, 1986.
[60]
S. Kirkpatrick, C. Gelatt, and M. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983.
[61]
Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
[62]
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[63]
E. A. Fox and J. A. Shaw, “Combination of multiple searches,” in Proceedings of The 3rd Text REtrieval Conference (TREC '04), NIST Publication, 1994.
[64]
H. J. Escalante, J. A. Gonzalez, C. Hernandez et al., “Annotation-based expansion and late fusion of mixed methods for multimedia image retrieval,” in Evaluating Systems for Multilingual and Multimodal Information Access, vol. 5706 of Lecture Notes in Computer Science, pp. 669–676, Springer, 2009.
[65]
I. Mani, Automatic Summarization (Natural Language Processing), John Benjamins Publishing Co, 2001.
[66]
M. D. Smucker, J. Allan, and B. Carterette, “Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes,” in Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '09), pp. 630–631, ACM Press, July 2009.
[67]
K. G. Kanji, 100 Statistical Tests / Gopal K. Kanji, Sage, London, UK, 1993.
[68]
H. J. Escalante, C. A. Hernández, J. A. Gonzalez et al., “The segmented and annotated IAPR TC-12 benchmark,” Computer Vision and Image Understanding, vol. 114, no. 4, pp. 419–428, 2010.
[69]
C. Snoek, M. Worring, A. Smeulders, and W. M. Arnold, “Early versus late fusion in semantic video analysis,” in Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05), pp. 399–402, ACM Press, New York, NY, USA, 2005.