oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Inferring Strategies for Sentence Ordering in Multidocument News Summarization  [PDF]
R. Barzilay,N. Elhadad
Computer Science , 2011, DOI: 10.1613/jair.991
Abstract: The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies.
Evaluation of Multi Document Summarization Techniques
R. Nedunchelian,R. Muthucumarasamy,E. Saranathan
Research Journal of Applied Sciences , 2012, DOI: 10.3923/rjasci.2012.229.233
Abstract: Multi Document Summarization is carried out using MEAD extraction algorithm, Naive Bayesian classifier and genetic algorithm. The summary generated contains the selected sentences from each document and output them in the order prevalent in the original document, the order of the sentences in the summary may not be logical in occurrence. Hence to overcome this Timestamp concept is implemented. This gives the summary an ordered look, bringing out a coherent looking summary. Instead of taking up each sentence for comparison for summarization from all documents, it would be more than enough to summarize only the document (frequent document) which has been put to many numbers of readers. The Timestamp and Frequent document concepts are used to generate multi document summarization using MEAD extraction algorithm Naive Bayesian classifier and genetic algorithm and the results are compared and evaluated.
TGSum: Build Tweet Guided Multi-Document Summarization Dataset  [PDF]
Ziqiang Cao,Chengyao Chen,Wenjie Li,Sujian Li,Furu Wei,Ming Zhou
Computer Science , 2015,
Abstract: The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.
Reader-Aware Multi-Document Summarization via Sparse Coding  [PDF]
Piji Li,Lidong Bing,Wai Lam,Hang Li,Yi Liao
Computer Science , 2015,
Abstract: We propose a new MDS paradigm called reader-aware multi-document summarization (RA-MDS). Specifically, a set of reader comments associated with the news reports are also collected. The generated summaries from the reports for the event should be salient according to not only the reports but also the reader comments. To tackle this RA-MDS problem, we propose a sparse-coding-based method that is able to calculate the salience of the text units by jointly considering news reports and reader comments. Another reader-aware characteristic of our framework is to improve linguistic quality via entity rewriting. The rewriting consideration is jointly assessed together with other summarization requirements under a unified optimization model. To support the generation of compressive summaries via optimization, we explore a finer syntactic unit, namely, noun/verb phrase. In this work, we also generate a data set for conducting RA-MDS. Extensive experiments on this data set and some classical data sets demonstrate the effectiveness of our proposed approach.
Multi-document Biography Summarization  [PDF]
Liang Zhou,Miruna Ticrea,Eduard Hovy
Computer Science , 2005,
Abstract: In this paper we describe a biography summarization system using sentence classification and ideas from information retrieval. Although the individual techniques are not new, assembling and applying them to generate multi-document biographies is new. Our system was evaluated in DUC2004. It is among the top performers in task 5-short summaries focused by person questions.
An Optimization Model and DPSO-EDA for Document Summarization
Rasim M. Alguliev,Ramiz M. Aliguliyev,Chingiz A. Mehdiyev
International Journal of Information Technology and Computer Science , 2011,
Abstract: We model document summarization as a nonlinear 0-1 programming problem where an objective function is defined as Heronian mean of the objective functions enforcing the coverage and diversity. The proposed model implemented on a multi-document summarization task. Experiments on DUC2001 and DUC2002 datasets showed that the proposed model outperforms the other summarization methods.
Leveraging Word Embeddings for Spoken Document Summarization  [PDF]
Kuan-Yu Chen,Shih-Hung Liu,Hsin-Min Wang,Berlin Chen,Hsin-Hsi Chen
Computer Science , 2015,
Abstract: Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.
An Analytical Framework for Multi-Document Summarization
J Jayabharathy,S Kanmani,Buvana
International Journal of Computer Science Issues , 2011,
Abstract: Growth of information in the web leads to drastic increase in field of information retrieval. Information retrieval is the process of searching and extracting the required information from the web. The main purpose of the automated information retrieval system is to reduce the overload of document retrieval. Today's retrieval system presents vast information, which suffers from redundancy and irrelevance. There arises a need to provide high quality summary in order to allow the user to quickly locate the desired and concise information ase number of documents available on user's desktops and internet increases. This paper provides the complete survey, which gives a comparative study about the existing multi-Document summarization techniques. This study gives an overall view about the current research issues, recent methods for summarization, data set and metrics suitable for summarization. This frame work also investigates about the performance competence of the existing techniques.
Privacy-Preserving Multi-Document Summarization  [PDF]
Luís Marujo,José Portêlo,Wang Ling,David Martins de Matos,Jo?o P. Neto,Anatole Gershman,Jaime Carbonell,Isabel Trancoso,Bhiksha Raj
Computer Science , 2015,
Abstract: State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties. In this paper we propose a privacy-preserving approach to multi-document summarization. Our approach enables other parties to obtain summaries without learning anything else about the original documents' content. We use a hashing scheme known as Secure Binary Embeddings to convert documents representation containing key phrases and bag-of-words into bit strings, allowing the computation of approximate distances, instead of exact ones. Our experiments indicate that our system yields similar results to its non-private counterpart on standard multi-document evaluation datasets.
Document Summarization Using Positive Pointwise Mutual Information  [PDF]
Aji S,Ramachandra Kaimal
International Journal of Computer Science & Information Technology , 2012,
Abstract: The degree of success in document summarization processes depends on the performance of the method used in identifying significant sentences in the documents. The collection of unique words characterizes the major signature of the document, and forms the basis for Term-Sentence-Matrix (TSM). The Positive Pointwise Mutual Information, which works well for measuring semantic similarity in the TermSentence-Matrix, is used in our method to assign weights for each entry in the Term-Sentence-Matrix. The Sentence-Rank-Matrix generated from this weighted TSM, is then used to extract a summary from the document. Our experiments show that such a method would outperform most of the existing methods in producing summaries from large documents.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.