This paper describes an approach to exploit the implicit user feedback gathered during interactive video retrieval tasks. We propose a framework, where the video is first indexed according to temporal, textual, and visual features and then implicit user feedback analysis is realized using a graph-based methodology. The generated graph encodes the semantic relations between video segments based on past user interaction and is subsequently used to generate recommendations. Moreover, we combine the visual features and implicit feedback information by training a support vector machine classifier with examples generated from the aforementioned graph in order to optimize the query by visual example search. The proposed framework is evaluated by conducting real-user experiments. The results demonstrate that significant improvement in terms of precision and recall is reported after the exploitation of implicit user feedback, while an improved ranking is presented in most of the evaluated queries by visual example. 1. Introduction In the recent years, the rapid development of digital technologies has led to the growing storage and processing capabilities of computers, as well as to the establishment of fast and advanced communication networks. Taking also into account the low cost of image and video capturing devices and the deep penetration of Internet in today’s communities, large quantities of audiovisual content has become available and accessible worldwide. The availability of such content and the increasing user need of searching into multimedia collections place the demand for the development of advanced multimedia search engines; therefore, video retrieval remains one of the most challenging tasks of research. Despite the recent significant advances in this area, further advancements in several fields of video retrieval are required to improve the performance of current video search engines. More specifically, major research breakthroughs are still needed in the areas of semantic and interactive search possibly using multimodal analysis and retrieval algorithms, as well as relevance feedback . The state-of-the-art video retrieval systems incorporate and combine several advanced techniques including text retrieval and visual content-based search, in order to support the user in locating video clips that meet their demands. One of the main challenges faced by these approaches is to generate efficient representations and descriptions of the video source. The initial step towards this direction is the video segmentation and indexing into smaller video
M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-based multimedia information retrieval: state of the art and challenges,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 2, no. 1, pp. 1–19, 2006.
N. Sebe, M. S. Lew, X. Zhou, T. S. Huang, and E. M. Bakker, “The state of the art in image and video retrieval,” in Proceedings of the 2nd International Conference on Image and Video Retrieval, pp. 1–8, Urbana, Ill, USA, July 2003.
M. L. Kherfi, D. Brahmi, and D. Ziou, “Combining visual features with semantics for a more effective image retrieval,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), vol. 2, pp. 961–964, Cambridge, UK, August 2004.
S. F. Chang, R. Manmatha, and T. S. Chua, “Combining text and audio-visual features in video indexing,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), vol. 5, pp. V1005–V1008, Philadelphia, Pa, USA, March 2005.
S. Vrochidis, C. Doulaverakis, A. Gounaris, E. Nidelkou, L. Makris, and I. Kompatsiaris, “A hybrid ontology and visual-based retrieval model for cultural heritage multimedia collections,” International Journal of Metadata, Semantics and Ontologies, vol. 3, no. 3, pp. 167–182, 2008.
F. Hopfgartner, D. Vallet, M. Halvey, and J. Jose, “Search trails using user feedback to improve video search,” in Proceedings of the 16th ACM International Conference on Multimedia (MM '08), pp. 339–348, Vancouver, Canada, October 2008.
X. S. Zhou, Y. Wu, I. Cohen, and T. S. Huang, “Relevance feedback in content-based image and video retrieval,” in Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services, pp. 1–12, Queen Mary University of London, 2003.
G. Giacinto and F. Roli, “Instance-based relevance feedback for image retrieval,” in Advances in Neural Information Processing Systems, L. K. Saul, Y. Weiss , and L. Bottou, Eds., vol. 17, pp. 489–496, MIT Press, 2005.
T. Joachims, “Optimizing search engines using clickthrough data,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02), pp. 133–142, Alberta, Canada, July 2002.
F. Radlinski and T. Joachims, “Query chains: learning to rank from implicit feedback,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 239–248, Chicago, Ill, USA, August 2005.
B. Yang, T. Mei, X. S. Hua, L. Yang, S. Q. Yang, and M. Li, “Online video recommendation based on multimodal fusion and relevance feedback,” in Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR '07), pp. 73–80, Amesterdam, The Netherland, July 2007.
S. Vrochidis, I. Kompatsiaris, and I. Patras, “Optimizing visual search with implicit user feedback in interactive video retrieval,” in Proceedings of the ACM International Conference on Image and Video Retrieval I (CIVR '10), pp. 274–281, July 2010.
S. Vrochidis, I. Kompatsiaris, and I. Patras, “Exploiting implicit user feedback in interactive video retrieval,” in Proceedings of the 11th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '10), Desenzano del Garda, Italy, April 2010.
Y. Zhang, H. Fu, Z. Liang, Z. Chi, and D. Feng, “Eye movement as an interaction mechanism for relevance feedback in a content-based image retrieval system,” in Proceedings of the Eye Tracking Research and Applications Symposium (ETRA '10), pp. 37–40, Austin, Tex, USA, 2010.
A. Yazdani, J. S. Lee, and T. Ebrahimi, “Implicit emotional tagging of multimedia using EEG signals and brain computer interface,” in Proceedings of SIGMM Workshop on Social Media, pp. 81–88, New York, NY, USA, 2009.
M. Claypool, P. Le, M. Wased, and D. Brown, “Implicit interest indicators,” in Proceedings of the International Conference on Intelligent User Interfaces (IUI '01), pp. 33–40, Santa Fe, Mex, USA, 2001.
R. White, I. Ruthven, and J. M. Jose, “The use of implicit evidence for relevance feedback in web retrieval,” in Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval, pp. 93–109, Glasgow, UK, 2002.
F. Hopfgartner and J. Jose, “Evaluating the implicit feedback models for adaptive video retrieval,” in Proceedings of the International on Multimedia Information Retrieval, pp. 323–331, Bavaria, Germany, 2007.
D. Vallet, F. Hopfgartner, and J. Jose, “Use of implicit graph for recommending relevant videos: a simulated evaluation,” in Proceedings of the 30th Annual European Conference on Information Retrieval (ECIR '08), vol. 4956 of Lecture Notes in Computer Science, pp. 199–210, Glasgow, Scotland, 2008.
T. Joachims, “Training linear SVMs in linear time,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06), pp. 217–226, Philadelphia, Pa, USA, August 2006.
T. Joachims, “A support vector method for multivariate performance measures,” in Proceedings of the 22nd International Conference on Machine Learning (ICML '05), pp. 377–384, Bonn, Germany, August 2005.
A. Hanjalic, R. L. Lagendijk, and J. Biemond, “A new method for key frame based video content representation,” in Image Databases and Multimedia Search, A. Smeulders and R. Jain, Eds., pp. 97–107, World Scientific, 1997.
S. Vrochidis, P. King, L. Makris et al., “MKLab interactive video retrieval system,” in Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR '08), pp. 563–563, Niagara Falls, Canada, 2008.