全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
Algorithms  2012 

Incremental Clustering of News Reports

DOI: 10.3390/a5030364

Keywords: clustering, news, event detection, incremental clustering

Full-Text   Cite this paper   Add to My Lib

Abstract:

When an event occurs in the real world, numerous news reports describing this event start to appear on different news sites within a few minutes of the event occurrence. This may result in a huge amount of information for users, and automated processes may be required to help manage this information. In this paper, we describe a clustering system that can cluster news reports from disparate sources into event-centric clusters— i.e., clusters of news reports describing the same event. A user can identify any RSS feed as a source of news he/she would like to receive and our clustering system can cluster reports received from the separate RSS feeds as they arrive without knowing the number of clusters in advance. Our clustering system was designed to function well in an online incremental environment. In evaluating our system, we found that our system is very good in performing fine-grained clustering, but performs rather poorly when performing coarser-grained clustering.

References

[1]  Azzopardi, J.; Staff, C. Fusion of News Reports Using Surface-Based Methods. In WAINA’12: Proceedings of the 2012 26th International Conference on Advanced Information Networking and Applications Workshops, Fukuoka, Japan, 26–29 March 2012; IEEE Computer Society: Los Alamitos, CA, USA, 2012; pp. 809–814.
[2]  Azzopardi, J.; Staff, C. Automatic Adaptation and Recommendation of News Reports using Surface-Based Methods. In PAAMS’ 12 (Special Sessions): Proceedings of the 10th International Conference on Practical Applications of Agents and Multi-Agent Systems, Salamanca, Spain, 28–30 March 2012; Springer-Velag: Berlin/Heidelberg, Germany, 2012; pp. 69–76.
[3]  Ji, X.; Xu, W. Document Clustering with Prior Knowledge. In SIGIR’ 06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, USA, 6–11 August 2006; ACM: New York, NY, USA, 2006; pp. 405–412.
[4]  Surdeanu, M.; Turmo, J.; Ageno, A. A Hybrid Unsupervised Approach for Document Clustering. In KDD’ 05: Proceedings of the Eleventh ACM SIGKDD International Conference On Knowledge Discovery in Data Mining, Chicago, IL, USA, 21–24 August 2005; ACM: New York, NY, USA, 2005; pp. 685–690.
[5]  Kang, B.H.; Kim, Y.S.; Choi, Y.J. Does Multi-User Document Classification Really Help Knowledge Management? In AI’ 07: Proceedings of the 20th Australian Joint Conference on Advances in Artificial Intelligence, Gold Coast, Australia, 2–6 December 2007; Springer-Verlag: Berlin/Heidelberg, Germany, 2007; pp. 327–336.
[6]  Borko, H.; Bernick, M. Automatic document classification. J. ACM 1963, 10, 151–162, doi:10.1145/321160.321165.
[7]  Larsen, B.; Aone, C. Fast and Effective Text Mining Using Linear-Time Document Clustering. In KDD’ 99: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15–18 August 1999; ACM Press: New York, NY, USA, 1999; pp. 16–22.
[8]  Stavrianou, A.; Andritsos, P.; Nicoloyannis, N. Overview and semantic issues of text mining. SIGMOD Rec. 2007, 36, 23–34, doi:10.1145/1324185.1324190.
[9]  Viles, C.L.; French, J.C. On the Update of Term Weights in Dynamic Information Retrieval Systems. In CIKM’ 95: Proceedings of the Fourth International Conference on Information and Knowledge Management, Baltimore, MD, USA, 29 November–2 December 1995; ACM Press: New York, NY, USA, 1995; pp. 167–174.
[10]  Aslam, J.; Pelekhov, K.; Rus, D. A Practical Clustering Algorithm for Static and Dynamic Information Organization. In SODA’ 99: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, Baltimore, MD, USA, 17–19 January 1999; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1999; pp. 51–60.
[11]  Toda, H.; Kataoka, R. A Clustering Method for News Articles Retrieval System. In WWW’ 05: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, Chiba, Japan, 10–14 May 2005; ACM Press: New York, NY, USA, 2005; pp. 988–989.
[12]  Gulli, A. The Anatomy of a News Search Engine. In WWW’ 05: Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, Chiba, Japan, 10–14 May 2005; ACM Press: New York, NY, USA, 2005; pp. 880–881.
[13]  Sahoo, N.; Callan, J.; Krishnan, R.; Duncan, G.; Padman, R. Incremental Hierarchical Clustering of Text Documents. In CIKM’ 06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, VA, USA, 5–11 November 2006; ACM: New York, NY, USA, 2006; pp. 357–366.
[14]  Luo, G.; Tang, C.; Yu, P.S. Resource-Adaptive Real-Time New Event Detection. In SIGMOD’ 07: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, 11–14 June 2007; ACM: New York, NY, USA, 2007; pp. 497–508.
[15]  Stokes, N.; Carthy, J. First Story Detection Using a Composite Document Representation. In HLT’ 01: Proceedings of the First International Conference on Human Language Technology Research, San Diego, CA, USA, 18–21 March 2001; Association for Computational Linguistics: Morristown, NJ, USA, 2001; pp. 1–8.
[16]  Salton, G. Dynamic document processing. Commun. ACM 1972, 15, 658–668, doi:10.1145/361454.361509.
[17]  Cardoso-Cachopo, A.; Oliveira, A.L. Semi-Supervised Single-Label Text Categorization Using Centroid-Based Classifiers. In SAC’ 07: Proceedings of the 2007 ACM Symposium on Applied Computing, Seoul, Korea, 11–15 March 2007; ACM: New York, NY, USA, 2007; pp. 844–851.
[18]  Salton, G. A blueprint for automatic indexing. SIGIR Forum 1997, 31, 23–36, doi:10.1145/263868.263871.
[19]  Wang, C.; Zhang, M.; Ma, S.; Ru, L. Automatic Online News Issue Construction in Web Environment. In WWW’ 08: Proceeding of the 17th International Conference on World Wide Web, Beijing, China, 21–25 April 2008; ACM: New York, NY, USA, 2008; pp. 457–466.
[20]  Braun, R.K.; Kaneshiro, R. Exploiting Topic Pragmatics for New Event Detection in tdt-2004, Proc. of Topic Detection and Tracking Workshop; ACM Press: New York, NY, USA, 2004.
[21]  McKeown, K.R.; Barzilay, R.; Evans, D.; Hatzivassiloglou, V.; Klavans, J.L.; Nenkova, A.; Sable, C.; Schiffman, B.; Sigelman, S. Tracking and Summarizing News on a Daily Basis with Columbia’s Newsblaster. In HLT’ 02: Proceedings of the Human Language Technology Conference, San Diego, CA, USA, 24–27 March 2002.
[22]  Arora, R.; Bangalore, P. Text Mining: Classification & Clustering of Articles Related to Sports. In ACM-SE 43: Proceedings of the 43rd Annual Southeast Regional Conference, Kennesaw, GA, USA, 18–20 March 2005; ACM: New York, NY, USA, 2005; pp. 153–154.
[23]  Steinbach, M.; Karypis, G.; Kumar, V. A Comparison of Document Clustering Techniques. In Proceedings of the KDD Workshop on Text Mining, Boston, MA, USA, 20–23 August 2000.
[24]  Porter, M.F. An Algorithm for Suffix Stripping. In Readings in Information Retrieval; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1997; pp. 313–316.
[25]  Deng, S.; Peng, H. Document Classification Based on Support Vector Machine Using a Concept Vector Model. In WI’ 06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, Hong Kong, China, 18–22 December 2006; IEEE Computer Society: Washington, DC, USA, 2006; pp. 473–476.
[26]  Hearst, M.A.; Pedersen, J.O. Reexamining the Cluster Hypothesis: Scatter/gather on Retrieval Results. In SIGIR’ 96: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 18–22 August 1996; ACM Press: New York, NY, USA, 1996; pp. 76–84.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133