All Title Author
Keywords Abstract

Video Pulses: User-Based Modeling of Interesting Video Segments

DOI: 10.1155/2014/712589

Full-Text   Cite this paper   Add to My Lib


We present a user-based method that detects regions of interest within a video in order to provide video skims and video summaries. Previous research in video retrieval has focused on content-based techniques, such as pattern recognition algorithms that attempt to understand the low-level features of a video. We are proposing a pulse modeling method, which makes sense of a web video by analyzing users' Replay interactions with the video player. In particular, we have modeled the user information seeking behavior as a time series and the semantic regions as a discrete pulse of fixed width. Then, we have calculated the correlation coefficient between the dynamically detected pulses at the local maximums of the user activity signal and the pulse of reference. We have found that users' Replay activity significantly matches the important segments in information-rich and visually complex videos, such as lecture, how-to, and documentary. The proposed signal processing of user activity is complementary to previous work in content-based video retrieval and provides an additional user-based dimension for modeling the semantics of a social video on the web. 1. Introduction The web has become a very popular medium for sharing and watching video content [1]. Moreover, many organizations and academic institutions are making lecture videos and seminars available online. Previous work on video retrieval has investigated the content of the video and has contributed a standard set of procedures, tools, and data-sets for comparing the performance of video retrieval algorithms (e.g., TRECVID), but they have not considered the interactive behavior of the users as an integral part of the video retrieval process. In addition to watching and browsing video content on the web, people also perform other “social metadata” tasks, such as sharing, commenting videos, replying to other videos, or just expressing their preference/rating. User-based research has explored the association between commenting and microblogs, primarily tweets, or other text-based and explicitly user-generated content. Although there are various established information retrieval methods that collect and manipulate text, they could be considered burdensome for the users, in the context of video watching. In many cases, there is a lack of comment density when compared to the number of viewers of a video. There are a few research efforts to understand user-based video retrieval without the use of social metadata. In our research, we have developed a method that utilizes more so implicit user interactions for


[1]  M. Cha, H. Kwak, P. Rodriguez, Y. Ahnt, and S. Moon, “I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system,” in Proceedings of the 7th ACM SIGCOMM Internet Measurement Conference (IMC '07), pp. 1–14, ACM, San Diego, Calif, USA, October 2007.
[2]  J. Yew and D. A. Shamma, “Know your data: understanding implicit usage versus explicit action in video content classification,” in 5th Multimedia on Mobile Devices 2011; and Multimedia Content Access: Algorithms and Systems, vol. 7881 of Proceedings of SPIE, San Francisco, Calif, USA, January 2011.
[3]  E. G. Toms, C. Dufour, J. Lewis, and R. Baecker, “Assessing tools for use with webcasts,” in Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries, pp. 79–88, ACM Press, New York, NY, USA, June 2005.
[4]  B. T. Truong and S. Venkatesh, “Video abstraction: a systematic review and classification,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 3, no. 1, article 3, 2007.
[5]  Y. Takahashi, N. Nitta, and N. Babaguchi, “Video summarization for large sports video archives,” in Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 820–828, ACM, Singapore, 2005.
[6]  S. M. Drucker, A. Glatzer, S. de Mar, and C. Wong, “Smartskip: consumer level browsing and skipping of digital video content,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '02), pp. 219–226, Minneapolis, Minn, USA, April 2002.
[7]  F. C. Li, A. Gupta, E. Sanocki, L. W. He, and Y. Rui, “Browsing digital video,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '00), vol. 2, pp. 169–176, April 2000.
[8]  L. Chen, G. Chen, C. Xu, J. March, and S. Benford, “EmoPlayer: a media player for video clips with affective annotations,” Interacting with Computers, vol. 20, no. 1, pp. 17–28, 2008.
[9]  C. Crockford and H. Agius, “An empirical investigation into user navigation of digital video using the VCR-like control set,” International Journal of Human Computer Studies, vol. 64, no. 4, pp. 340–355, 2006.
[10]  J. Kim, H. Kim, and K. Park, “Towards optimal navigation through video content on interactive TV,” Interacting with Computers, vol. 18, no. 4, pp. 723–746, 2006.
[11]  A. G. Money and H. Agius, “Analysing user physiological responses for affective video summarisation,” Displays, vol. 30, no. 2, pp. 59–70, 2009.
[12]  R. Hjelsvold, S. Vdaygiri, and Y. Léauté, “Web-based personalization and management of interactive video,” in Proceedings of the 10th International Conference on World Wide Web (WWW '01), pp. 129–139, 2001.
[13]  R. Yan and A. G. Hauptmann, “A review of text and image retrieval approaches for broadcast news video,” Information Retrieval, vol. 10, no. 4-5, pp. 445–484, 2007.
[14]  C. G. M. Snoek and M. Worring, “Concept-based video retrieval,” Foundations and Trends in Information Retrieval, vol. 2, no. 4, pp. 215–322, 2008.
[15]  B. Yu, W. Y. Ma, K. Nahrstedt, and H. J. Zhang, “Video summarization based on user log enhanced link analysis,” in Proceedings of the 11th ACM International Conference on Multimedia (MULTIMEDIA '03), pp. 382–391, ACM Press, New York, NY, USA, November 2003.
[16]  T. Syeda-Mahmood and D. Ponceleon, “Learning video browsing behavior and its application in the generation of video previews,” in Proceedings of the 9th ACM International Conference on Multimedia (MULTIMEDIA '01), pp. 119–128, ACM Press, New York, NY, USA, October 2001.
[17]  R. Shaw and M. Davis, “Toward emergent representations for video,” in Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05), pp. 431–434, ACM, New York, NY, USA, 2005.
[18]  C. Gkonela and K. Chorianopoulos, “VideoSkip: event detection in social web videos with an implicit user heuristic,” Multimedia Tools and Applications, 2012.
[19]  K. Chorianopoulos, “Collective intelligence within web video,” Human-Centric Computing and Information Sciences, vol. 3, article 10, 2013.
[20]  I. Groma, F. F. Csikor, and M. Zaiser, “Spatial correlations and higher-order gradient terms in a continuum description of dislocation dynamics,” Acta Materialia, vol. 51, no. 5, pp. 1271–1281, 2003.
[21]  E. Vanmarcke, Random Fields, Analysis and Synthesis, MIT Press, Cambridge, Mass, USA, 1983.
[22]  A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill Kogakusha, Tokyo, Japan, 9th edition, 1965.
[23]  M. Zaiser, M. C. Miguel, and I. Groma, “Statistical dynamics of dislocation systems: the influence of dislocation-dislocation correlations,” Physical Review B, vol. 64, no. 22, Article ID 224102, 9 pages, 2001.


comments powered by Disqus