全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Multimodal Indexing of Multilingual News Video

DOI: 10.1155/2010/486487

Full-Text   Cite this paper   Add to My Lib

Abstract:

The problems associated with automatic analysis of news telecasts are more severe in a country like India, where there are many national and regional language channels, besides English. In this paper, we present a framework for multimodal analysis of multilingual news telecasts, which can be augmented with tools and techniques for specific news analytics tasks. Further, we focus on a set of techniques for automatic indexing of the news stories based on keywords spotted in speech as well as on the visuals of contemporary and domain interest. English keywords are derived from RSS feed and converted to Indian language equivalents for detection in speech and on ticker texts. Restricting the keyword list to a manageable number results in drastic improvement in indexing performance. We present illustrative examples and detailed experimental results to substantiate our claim. 1. Introduction Analysis of public newscast by domestic as well as foreign TV channels for tracking news, national and international views and public opinion is of paramount importance for media analysts in several domains, such as journalism, brand monitoring, law enforcement and internal security. The channels representing different countries, political groups, religious conglomerations, and business interests present different perspectives and viewpoints of the same event. Round the clock monitoring of hundreds of news channels requires unaffordable manpower. Moreover, the news stories of interest may be confined to a narrow slice of the total telecast time and they are often repeated several times on the news channels. Thus, round-the-clock monitoring of the channels is not only a wasteful exercise but is also prone to error because of distractions caused while viewing extraneous telecast and consequent loss of attention. This motivates a system that can automatically analyze, classify, cluster and index the news-stories of interest. In this paper we present a set of visual and audio processing techniques that helps us in achieving this goal. While there has been significant research in multimodal analysis of news-video for their automated indexing and classification, the commercial applications are yet to mature. Commercial products like BBN Broadcast monitoring system (http://www.bbn.com/products_and_services/bbn_broadcast_monitoring_system/) and Nexidia rich media solution (http://www.nexidia.com/solutions/rich_media) offer speech analytics-based solution for news video indexing and retrieval. None of these solutions can differentiate between news programs from other TV programs

References

[1]  G. S. Lehal, “Optical character recognition of Gurumukhi script using multiple classifiers,” in Proceedings of the International Workshop on Multilingual (OCR '09), Barcelona, Spain, July 2009.
[2]  C. V. Jawahar, M. N. S. S. K. P. Kumar, and S. S. R. Kiran, “A bilingual OCR for Hindi-Telugu documents and its applications,” in Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR '03), vol. 1, p. 408, 2003.
[3]  E. Hassan, S. Chaudhury, and M. Gopal, “Shape descriptor based document image indexing and symbol recognition,” in Proceedings of the International Conference on Document Analysis and Recognition, 2009.
[4]  U. Bhattacharya and B. B. Chaudhuri, “Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 444–457, 2009.
[5]  S. K. Parui, K. Guin, U. Bhattacharya, and B. B. Chaudhuri, “Online handwritten Bangla character recognition using HMM,” in Proceedings of the International Conference on Pattern Recognition (ICPR '08), pp. 1–4, 2008.
[6]  S. Eickeler and S. Mueller, “Content-based video indexing of TV broadcast news using hidden Markov models,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), vol. 6, pp. 2997–3000, March 1999.
[7]  J. R. Smith, M. Campbell, M. Naphade, A. Natsev, and J. Tesic, “Learning and classification of semantic concepts in broadcast video,” in Proceedings of the International Conference of Intelligence Analysis, 2005.
[8]  J.-L. Gauvain, L. Lamel, and G. Adda, “Transcribing broadcast news for audio and video indexing,” Communications of the ACM, vol. 43, no. 2, pp. 64–70, 2000.
[9]  H. Meinedo and J. Neto, “Detection of acoustic patterns in broadcast news using neural networks,” Acustica, 2004.
[10]  C.-M. Kuo, C.-P. Chao, W.-H. Chang, and J.-L. Shen, “Broadcast video logo detection and removing,” in Proceedings of the 4th International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP '08), pp. 837–840, Harbin, China, August 2008.
[11]  D. A. Sadlier, S. Marlow, N. Connor, and N. Murphy, “Automatic TV advertisement detection from MPEG bit stream,” Pattern Recognition, vol. 35, no. 12, pp. 2719–2726, 2002.
[12]  T.-Y. Liu, T. Qin, and H.-J. Zhang, “Time-constraint boost for TV commercials detection,” in Proceedings of the International Conference on Image Processing (ICIP '04), vol. 3, pp. 1617–1620, October 2004.
[13]  X.-S. Hua, L. Lu, and H.-J. Zhang, “Robust learning-based TV commercial detection,” in Proceedings of the ACM International Conference on Multimedia and Expo (ICME '05), pp. 149–152, Amsterdam, The Netherlands, July 2005.
[14]  K. Ng and V. W. Zue, “Phonetic recognition for spoken document retrieval,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '98), vol. 1, pp. 325–328, 1998.
[15]  M. Laxminarayana and S. Kopparapu, “Semi-automatic generation of pronunciation dictionary for proper names: an optimization approach,” in Proceedings of the 6th International Conference on Natural Language Processing (ICON '08), pp. 118–126, CDAC, Pune, India, December 2008.
[16]  J. Makhoul, F. Kubala, T. Leek, et al., “Speech and language technologies for audio indexing and retrieval,” Proceedings of the IEEE, vol. 88, no. 8, pp. 1338–1352, 2000.
[17]  S. Renals, D. Abberley, D. Kirby, and T. Robinson, “Indexing and retrieval of broadcast news,” Speech Communication, vol. 32, no. 1, pp. 5–20, 2000.
[18]  T. Chua, S. Y. Neo, K. Li, et al., “TRECVID 2004 search and feature extraction tasks by NUS PRIS,” in NIST TRECVID-2004, 2004.
[19]  T. Chua, S.-F. Chang, L. Chaisorn, and W. Hsu, “Story boundary detection in large broadcast news video archives: techniques, experience and trends,” in Proceedings of the 12th ACM International Conference on Multimedia (MM '04), pp. 656–659, 2004.
[20]  A. Rosenberg and J. Hirschberg, “Story segmentation of broadcast news in English, Mandarin and Arabic,” in Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, June 2006.
[21]  M. Franz and J.-M. Xu, “Story segmentation of broadcast news in Arabic, Chinese and English using multi-window features,” in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '07), pp. 703–704, 2007.
[22]  M. A. Hearst, “TextTiling: segmenting text into multi-paragraph subtopic passages,” Computational Linguistics, vol. 23, no. 1, pp. 33–64, 1997.
[23]  X. Gao and X. Tang, “Unsupervised video-shot segmentation and model-free anchor-person detection for news video parsing,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 9, pp. 765–776, 2002.
[24]  S.-F. Chang, R. Manmatha, and T.-S. Chua, “Combining text and audio-visual features in video indexing,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), vol. 5, pp. 1005–1008, 2005.
[25]  L. Chaisorn, T.-S. Chua, and C.-H. Lee, “A multi-modal approach to story segmentation for news video,” World Wide Web, vol. 6, no. 2, pp. 187–208, 2003.
[26]  L. Besacier, G. Quénot, S. Ayache, and D. Moraru, “Video story segmentation with multi-modal features: experiments on TRECvid 2003,” in Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR '04), pp. 221–226, October 2004.
[27]  Anonymous, “F1 Score,” Wikipedia—The Free Encyclopedia, February 2010, http://en.wikipedia.org/wiki/F1_score.
[28]  F. Colace, P. Foggia, and G. Percannella, “A probabilistic framework for TV-news stories detection and classification,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '05), pp. 1350–1353, July 2005.
[29]  G. Harit, S. Chaudhury, and H. Ghosh, “Using multimedia ontology for generating conceptual annotations and hyperlinks in video collections,” in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '06), pp. 211–217, Hong Kong, December 2006.
[30]  Anonymous, “News Ticker,” Wikipedia—The Free Encyclopedia, February 2010, http://en.wikipedia.org/wiki/News_ticker.
[31]  D. Winer, “RSS 2.0 Specification,” Wikipedia—The free Encyclopedia, February 2010, http://cyber.law.harvard.edu/rss/rss.html.
[32]  S. Kopparapu, A. Srivastava, and P. V. S. Rao, “Minimal parsing key concept based question answering system,” Human Computer Interaction, vol. 3, 2007.
[33]  P. Gelin and C. J. Wellekens, “Keyword spotting for video soundtrack indexing,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 299–302, May 1996.
[34]  Y. Oh, J.-S. Park, and K.-M. Park, “Keyword spotting in broadcast news,” in Global-Network-Oriented Information Electronics (IGNOIE-COE06), pp. 208–213, Sendai, Japan, January 2007.
[35]  G. Quenot, T. P. Tan, L. V. Bac, S. Ayache, L. Besacier, and P. Mulhem, “Content-based search in multi-lingual audiovisual documents using the international phonetic alphabet,” in Proceedings of the 7th International Workshop on Content-Based Multimedia Indexing (CBMI '09), Chania, Greece, June 2009.
[36]  D. Dimitriadis, A. Metallinou, I. Konstantinou, G. Goumas, P. Maragos, and N. Koziris, “GRIDNEWS1a distribured automatic Greek broadcast transcription system,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '09), 2009.
[37]  J. Yi, Y. Peng, and J. Xiao, “Color-based clustering for text detection and extraction in image,” in Proceedings of the ACM International Multimedia Conference and Exhibition (MM '07), pp. 847–850, Augsburg, Germany, Sebtember 2007.
[38]  J. Sun, Z. Wang, H. Yu, F. Nishino, Y. Katsuyama, and S. Naoi, “Effective text extraction and recognition for WWW images,” in Proceedings of the ACM Symposium on Document Engineering (DocEng '03), pp. 115–117, Grenoble, France, November 2003.
[39]  Q. Ye, Q. Huang, W. Gao, and D. Zhao, “Fast and robust text detection in images and video frames,” Image and Vision Computing, vol. 23, no. 6, pp. 565–576, 2005.
[40]  J. Gllavata, R. Ewerth, and B. Freisleben, “Tracking text in MPEG videos,” ACM, 2004.
[41]  A. D. Bagdanov, L. Ballan, M. Bertini, and A. Del Bimbo, “Trademark matching and retrieval in sports video databases,” in Proceedings of the International Workshop on Multimedia Information Retrieval (MIR '07), pp. 79–86, Augsburg, Germany, Sebtember 2007.
[42]  N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62–66, 1979.
[43]  R. Y. Tsai and T. S. Huang, “Multiple frame image restoration and registration,” in Advances in Computer Vision and Image Processing, pp. 317–339, JAI Press, Greenwich, Conn, USA, 1984.
[44]  V. H. Patil, D. S. Bormane, and H. K. Patil, “Color super resolution image reconstruction,” in Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA '07), vol. 3, pp. 366–370, 2007.
[45]  P. Vandewalle, S. Süsstrunk, and M. Vetterli, “A frequency domain approach to registration of aliased images with application to super-resolution,” EURASIP Journal on Applied Signal Processing, vol. 2006, pp. 1–14, 2006.
[46]  M. A. Hasnat, M. R. Chowdhury, and M. Khan, “Integrating Bangla script recognition support in Tesseract OCR,” in Proceedings of the Conference on Language and Technology, 2009.
[47]  S. V. Rice, F. R. Jenkins, and T. A. Nartker, “The fourth annual test of OCR accuracy,” Tech. Rep. 95-04, Information Science Research Institute, University of Nevada, Las Vegas, Nev, USA, April 1995.
[48]  R. Smith, “An overview of the Tesseract OCR engine,” in Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR '07), vol. 2, pp. 629–633, September 2007.
[49]  M. Gilleland, “Levenshtein Distance, in Three Flavors,” February 2010, http://www.merriampark.com/ld.htm.
[50]  R. Lienhart and A. Wernicke, “Localizing and segmenting text in images and videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 4, pp. 256–268, 2002.
[51]  A. G. Hauptmann, R. Jin, and T. D. Ng, “Multi-modal information retrieval from broadcast video using OCR and speech recognition,” in Proceedings of the 2nd ACM International Conference on Digital Libraries, pp. 160–161, Portland, Ore, USA, 2002.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133