We present a user-based method that detects regions of interest within a video in order to provide video skims and video summaries. Previous research in video retrieval has focused on content-based techniques, such as pattern recognition algorithms that attempt to understand the low-level features of a video. We are proposing a pulse modeling method, which makes sense of a web video by analyzing users' Replay interactions with the video player. In particular, we have modeled the user information seeking behavior as a time series and the semantic regions as a discrete pulse of fixed width. Then, we have calculated the correlation coefficient between the dynamically detected pulses at the local maximums of the user activity signal and the pulse of reference. We have found that users' Replay activity significantly matches the important segments in information-rich and visually complex videos, such as lecture, how-to, and documentary. The proposed signal processing of user activity is complementary to previous work in content-based video retrieval and provides an additional user-based dimension for modeling the semantics of a social video on the web. 1. Introduction The web has become a very popular medium for sharing and watching video content . Moreover, many organizations and academic institutions are making lecture videos and seminars available online. Previous work on video retrieval has investigated the content of the video and has contributed a standard set of procedures, tools, and data-sets for comparing the performance of video retrieval algorithms (e.g., TRECVID), but they have not considered the interactive behavior of the users as an integral part of the video retrieval process. In addition to watching and browsing video content on the web, people also perform other “social metadata” tasks, such as sharing, commenting videos, replying to other videos, or just expressing their preference/rating. User-based research has explored the association between commenting and microblogs, primarily tweets, or other text-based and explicitly user-generated content. Although there are various established information retrieval methods that collect and manipulate text, they could be considered burdensome for the users, in the context of video watching. In many cases, there is a lack of comment density when compared to the number of viewers of a video. There are a few research efforts to understand user-based video retrieval without the use of social metadata. In our research, we have developed a method that utilizes more so implicit user interactions for
M. Cha, H. Kwak, P. Rodriguez, Y. Ahnt, and S. Moon, “I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system,” in Proceedings of the 7th ACM SIGCOMM Internet Measurement Conference (IMC '07), pp. 1–14, ACM, San Diego, Calif, USA, October 2007.
J. Yew and D. A. Shamma, “Know your data: understanding implicit usage versus explicit action in video content classification,” in 5th Multimedia on Mobile Devices 2011; and Multimedia Content Access: Algorithms and Systems, vol. 7881 of Proceedings of SPIE, San Francisco, Calif, USA, January 2011.
E. G. Toms, C. Dufour, J. Lewis, and R. Baecker, “Assessing tools for use with webcasts,” in Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries, pp. 79–88, ACM Press, New York, NY, USA, June 2005.
Y. Takahashi, N. Nitta, and N. Babaguchi, “Video summarization for large sports video archives,” in Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 820–828, ACM, Singapore, 2005.
S. M. Drucker, A. Glatzer, S. de Mar, and C. Wong, “Smartskip: consumer level browsing and skipping of digital video content,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '02), pp. 219–226, Minneapolis, Minn, USA, April 2002.
F. C. Li, A. Gupta, E. Sanocki, L. W. He, and Y. Rui, “Browsing digital video,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '00), vol. 2, pp. 169–176, April 2000.
C. Crockford and H. Agius, “An empirical investigation into user navigation of digital video using the VCR-like control set,” International Journal of Human Computer Studies, vol. 64, no. 4, pp. 340–355, 2006.
R. Hjelsvold, S. Vdaygiri, and Y. Léauté, “Web-based personalization and management of interactive video,” in Proceedings of the 10th International Conference on World Wide Web (WWW '01), pp. 129–139, 2001.
B. Yu, W. Y. Ma, K. Nahrstedt, and H. J. Zhang, “Video summarization based on user log enhanced link analysis,” in Proceedings of the 11th ACM International Conference on Multimedia (MULTIMEDIA '03), pp. 382–391, ACM Press, New York, NY, USA, November 2003.
T. Syeda-Mahmood and D. Ponceleon, “Learning video browsing behavior and its application in the generation of video previews,” in Proceedings of the 9th ACM International Conference on Multimedia (MULTIMEDIA '01), pp. 119–128, ACM Press, New York, NY, USA, October 2001.
R. Shaw and M. Davis, “Toward emergent representations for video,” in Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05), pp. 431–434, ACM, New York, NY, USA, 2005.
M. Zaiser, M. C. Miguel, and I. Groma, “Statistical dynamics of dislocation systems: the influence of dislocation-dislocation correlations,” Physical Review B, vol. 64, no. 22, Article ID 224102, 9 pages, 2001.