OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

International Journal of Digital Multimedia Broadcasting 2010

Unsupervised Segmentation Methods of TV Contents

DOI: 10.1155/2010/539796

Elie El-Khoury,Christine Sénac,Philippe Joly

Full-Text Cite this paper Add to My Lib

Abstract:

We present a generic algorithm to address various temporal segmentation topics of audiovisual contents such as speaker diarization, shot, or program segmentation. Based on a GLR approach, involving the ΔBIC criterion, this algorithm requires the value of only a few parameters to produce segmentation results at a desired scale and on most typical low-level features used in the field of content-based indexing. Results obtained on various corpora are of the same quality level than the ones obtained by other dedicated and state-of-the-art methods. 1. Introduction Nowadays, due to an explosive growth of digital video content (both online and offline and available by means of public or private databases and TV broadcasts), there is an increasing of accessibility for these data. Actually, the wealth of information raises the problem of an adapted access to video content which includes heterogeneous information that can be interpreted at different granularity levels, thus leading to many profiles of requests. Under these conditions, automatic indexing of the structure, which provides direct access to the various components of the multimedia document, becomes a fundamental issue. For this purpose, a temporal segmentation of audiovisual is required as a preprocessing operation. Results of this segmentation may be directly used for delinearization purposes such as providing a direct access to the content itself. They can also feed other analysis algorithms aiming at producing synoptical views of the content or exploiting temporal redundancy properties inside homogeneous segments to speed up the processing time. Basically, temporal segmentation tools work on a low-level feature (or a small set of low-level features) extracted from the content along the time. Commonly, these low-level features express meaningful properties that can be observed or processed directly from the signal, such as spectrum/cepstrum features for an audio signal or color histograms for an image. They are expressed numerically and represented through vectors whose dimensions depend on the number of those features. Two kinds of segmentation strategies can then be applied. Some algorithms try to gather set of successive values which are supposed to belong to a same homogeneous segment. Some others are focusing on transitions detection between segments. Such algorithms have been developed independently one with the others for different temporal segmentation problems. Among the most addressed ones, we find the “audio turn” segmentation. An “audio turn” denotes a homogeneous audio segment related

References

[1]	B. T. Truong, C. Dorai, and S. Venkatesh, “New enhancements to cut, fade, and dissolve detection processes in video segmentation,” in Proceedings of the 8th ACM International conference on Multimedia (ACM '00), pp. 219–227, New York, NY, USA, 2000.
[2]	A. Hanjalic, “Shot-boundary detection: unraveled and resolved?” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 2, pp. 90–105, 2002.
[3]	Z. Liu and Y. Wang, “Major cast detection in video using both speaker and face information,” IEEE Transactions on Multimedia, vol. 9, no. 1, pp. 89–101, 2007.
[4]	A. M. A. Ahmad, “Multimedia content and the semantic web: methods, standards and tools: book reviews,” Journal of the American Society for Information Science and Technology, vol. 58, no. 3, pp. 457–458, 2007.
[5]	H. Sundaram and S.-F. Chang, “Video scene segmentation using video and audio features,” in Proceedings of the IEEE International Conference on Multi-Media and Expo (ICME '00), vol. 2, pp. 1145–1148, Beijing, China, 2000.
[6]	P. Aigrain, P. Joly, and V. Longueville, “Medium knowledge-based macro-segmentation of video into sequences,” in Intelligent Multimedia Information Retrieval, pp. 159–173, MIT Press, Cambridge, Mass, USA, 1997.
[7]	M. Bertini, A. Del Bimbo, and P. Pala, “Content-based indexing and retrieval of TV news,” Pattern Recognition Letters, vol. 22, no. 5, pp. 503–516, 2001.
[8]	G. Piriou, P. Bouthemy, and J.-F. Yao, “Extraction of semantic dynamic content from videos with probabilistic motion models,” in Proceedings of the 18th European Conference on Computer Vision (ECCV '04), vol. 3023, pp. 145–157, 2004.
[9]	M. Yeung, B.-L. Yeo, and B. Liu, “Extracting story units from long programs for video browsing and navigation,” in Readings in Multimedia Computing and Networking, pp. 360–369, Morgan Kaufmann Publishers, San Francisco, Claif, USA, 2001.
[10]	L. Liang, H. Lu, X. Xue, and Y.-P. Tan, “Program segmentation for TV videos,” in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), pp. 1549–1552, Kobe, Japan, May 2005.
[11]	J.-P. Poli and J. Carrive, “Modeling television schedules for television stream structuring,” in Proceedings of the 13th International Multimedia Modeling Conference (MMM '07), vol. 2, pp. 680–689, Singapor, 1996.
[12]	X. Naturel, G. Gravier, and P. Gros, “Fast structuring of large television streams using program guides,” in Proceedings of the 4th International Workshop on Adaptive Multimedia Retrieval (AMR '06), vol. 4398 of Lecture Notes in Computer Science, pp. 222–231, Paris, France, 2007.
[13]	E. Scheirer and M. Slaney, “Construction and evaluation of a robust multifeature speech/music discriminator,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), vol. 2, pp. 1331–1334, Munich, Germany, 1997.
[14]	J. Pinquier and R. André-Obrecht, “Audio indexing: primary components retrieval: robust classification in audio documents,” Multimedia Tools and Applications, vol. 30, no. 3, pp. 313–330, 2006.
[15]	S. Lefevre, B. Maillard, and N. Vincent, “Deux niveaux et deux outils d'analyse pour une meilleure segmentation de données audio,” in Proceedings of the 19th Colloque GRETSI sur le Traitement du Signal et des Images, Paris, France, September 2003.
[16]	L. Lu, R. Cai, and A. Hanjalic, “Audio elements based auditory scene segmentation,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), vol. 5, pp. 17–20, Orlando, Fla, USA, May 2006.
[17]	G. Tzanetakis and G. Essl, “Automatic musical genre classification of audio signals,” in Proceedings of the IEEE Transactions on Speech and Audio Processing, pp. 293–302, New York, NY, USA, 2001.
[18]	T. Heittola and A. Klapuri, “Locating segments with drums in music signals,” Tech. Rep., Tampere University of Technology, Tampere, Finland, August 2002.
[19]	O. Gillet and G. Richard, “Comparing audio and video segmentations for music videos indexing,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '06), vol. 2, pp. 873–876, Toulouse, France, May 2006.
[20]	J. Saunders, “Real-time discrimination of broadcast speech/music,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), vol. 2, pp. 993–996, Atlanta, Ga, USA, 1996.
[21]	J. T. Foote and M. L. Cooper, “Media segmentation using self-similarity decomposition,” in Storage and Retrieval for Media Databases, vol. 5021 of Proceedings of SPIE, pp. 167–175, Santa Clara, Claif, USA, January 2003.
[22]	S. Haidar, P. Joly, and B. Chebaro, “Style similarity measure for video documents comparison,” in Proceedings of the 4th International Conference on Image and Video Retrieval (CIVR '05), vol. 3568 of Lecture Notes in Computer Science, Springer, Singapore, July 2005.
[23]	E. El Khoury, C. Sénac, and R. André-Obrecht, “Speaker diarization: towards a more robust and portable system,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '07), vol. 4, pp. 489–492, Honolulu, Hawaii, USA, 2007.
[24]	H. Gish, M.-H. Siu, and R. Rohlicek, “Segregation of speakers for speech recognition and speaker identification,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '91), vol. 2, pp. 873–876, Toronto, Canada, 1991.
[25]	S. S. Chen and P. S. Gopalakrishnan, “Clustering via the Bayesian information criterion with applications in speech recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '98), vol. 2, pp. 645–648, Seattle, Wash, USA, May 1998.
[26]	J. Rissanen, Stochastic Complexity in Statistical Inquiry Theory, vol. 2, World Scientific Publishing, River Edge, NJ, USA, 1989.
[27]	P. Sivakumaran, J. Fortuna, and A. Ariyaeeinia, “On the use of the Bayesian information criterion in multiple speaker detection,” in Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech '01), vol. 2, pp. 795–798, Aalborg, Denmark, 2001.
[28]	P. Joly, J. Benois-Pineau, E. Kijak, and G. Quénot, “The ARGOS campaign: evaluation of video analysis and indexing tools,” Signal Processing: Image Communication, vol. 22, no. 7-8, pp. 705–717, 2007.
[29]	A. Tritschler and R. Gopinath, “Improved speaker segmentation and segments clustering using the Bayesian information criterion,” in Proceedings of the European Speech Processing ( Eurospeech '99), vol. 2, pp. 679–682, Budapest, Hungary, 199.
[30]	P. Delacourt, D. Kryze, and C. J. Wellekens, “DISTBIC: a speaker-based segmentation for audio data indexing,” Speech Communication, vol. 32, no. 1, pp. 111–126, 2000.
[31]	E. El-Khoury, C. Sénac, and J. Pinquier, “Improved speaker diarization system for meetings,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '09), pp. 4097–4100, Taipei, China, 2009.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133