oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Joint and conditional estimation of tagging and parsing models  [PDF]
Mark Johnson
Computer Science , 2001,
Abstract: This paper compares two different ways of estimating statistical language models. Many statistical NLP tagging and parsing models are estimated by maximizing the (joint) likelihood of the fully-observed training data. However, since these applications only require the conditional probability distributions, these distributions can in principle be learnt by maximizing the conditional likelihood of the training data. Perhaps somewhat surprisingly, models estimated by maximizing the joint were superior to models estimated by maximizing the conditional, even though some of the latter models intuitively had access to ``more information''.
Parsing DOM Tree Reversely and Extracting Web Main Page Information
逆序解析DOM树及网页正文信息提取

ZHANG Rui-xue,SONG Ming-qiu,GONG Yan-lei,
张瑞雪
,宋明秋,公衍磊

计算机科学 , 2011,
Abstract: To extract main content from HTML Web page, generally, we should parse HTML, visit the whole DOM tree, and extract the data from the tree by distribution. However, this method separates the two processes of parsing and extracting and therefore restricts the speed. Actually, parsing the whole DOM tree is unnecessary. Here we supposed the algorithm of parsing DOM tree by reverse order. Then combining with the theory of DOM similarity and the traditional method of parsing DOM we parsed IWM tree with both normal order and reverse order, and at the same time we fixed the positions of other targots and got them. On the one hand, this method only parses part of DOM tree, so it reduces the time cost by parsing. On the other hand, we do not have to visit the whole tree to search the target information, as a result, it saves the searching time. Overall, this method improves the speed much. At the end of this paper, we gave the proof on the superiority of this method.
Fast Chinese syntactic parsing method based on conditional random fields
Fast Chinese syntactic parsing method based on conditional random fields
 [PDF]

韩磊,罗森林,陈倩柔,潘丽敏
- , 2015, DOI: 10.15918/j.jbit1004-0579.201524.0414
Abstract: A fast method for phrase structure grammar analysis is proposed based on conditional random fields (CRF). The method trains several CRF classifiers for recognizing the phrase nodes at different levels, and uses the bottom-up to connect the recognized phrase nodes to construct the syntactic tree. On the basis of Beijing forest studio Chinese tagged corpus, two experiments are designed to select the training parameters and verify the validity of the method. The result shows that the method costs 78.98.ms and 4.63.ms to train and test a Chinese sentence of 17.9 words. The method is a new way to parse the phrase structure grammar for Chinese, and has good generalization ability and fast speed.
A fast method for phrase structure grammar analysis is proposed based on conditional random fields (CRF). The method trains several CRF classifiers for recognizing the phrase nodes at different levels, and uses the bottom-up to connect the recognized phrase nodes to construct the syntactic tree. On the basis of Beijing forest studio Chinese tagged corpus, two experiments are designed to select the training parameters and verify the validity of the method. The result shows that the method costs 78.98.ms and 4.63.ms to train and test a Chinese sentence of 17.9 words. The method is a new way to parse the phrase structure grammar for Chinese, and has good generalization ability and fast speed.
Named entity recognition using conditional random fields with non-local relational constraints  [PDF]
Flavio Massimiliano Cecchini,Elisabetta Fersini
Computer Science , 2013,
Abstract: We begin by introducing the Computer Science branch of Natural Language Processing, then narrowing the attention on its subbranch of Information Extraction and particularly on Named Entity Recognition, discussing briefly its main methodological approaches. It follows an introduction to state-of-the-art Conditional Random Fields under the form of linear chains. Subsequently, the idea of constrained inference as a way to model long-distance relationships in a text is presented, based on an Integer Linear Programming representation of the problem. Adding such relationships to the problem as automatically inferred logical formulas, translatable into linear conditions, we propose to solve the resulting more complex problem with the aid of Lagrangian relaxation, of which some technical details are explained. Lastly, we give some experimental results.
Automatically Extracting Academic Papers from Web Pages Using Conditional Random Fields Model  [cached]
Wei Liu,Jianxun Zeng
Journal of Software , 2011, DOI: 10.4304/jsw.6.8.1409-1416
Abstract: A huge amount of academic papers(including research reports) are being released in web pages. It is important to extract these papers in a structured way for many popular applications, such as science and technology information retrieval and digital library. However, few investigations have been done on the issue of academic paper extraction. This paper proposed a unified approach for automatically extracting academic papers from web pages based on CRF model. In the proposed approach, both academic paper extraction and semantic labeling are performed simultaneously by employing the theoretical Conditional Random Fields(CRF) model. Experimental results show that our approach can achieve significantly better extraction results.
Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text  [cached]
The Python Papers , 2008,
Abstract: Background: An ongoing assessment of the literature is difficult with the rapidly increasing volume of research publications and limited effective information extraction tools which identify entity relationships from text. A recent study reported development of Muscorian, a generic text processing tool for extracting protein-protein interactions from text that achieved comparable performance to biomedical-specific text processing tools. This result was unexpected since potential errors from a series of text analysis processes is likely to adversely affect the outcome of the entire process. Most biomedical entity relationship extraction tools have used biomedical-specific parts-of-speech (POS) tagger as errors in POS tagging and are likely to affect subsequent semantic analysis of the text, such as shallow parsing. This study aims to evaluate the parts-of-speech (POS) tagging accuracy and attempts to explore whether a comparable performance is obtained when a generic POS tagger, MontyTagger, was used in place of MedPost, a tagger trained in biomedical text. Results: Our results demonstrated that MontyTagger, Muscorian's POS tagger, has a POS tagging accuracy of 83.1% when tested on biomedical text. Replacing MontyTagger with MedPost did not result in a significant improvement in entity relationship extraction from text; precision of 55.6% from MontyTagger versus 56.8% from MedPost on directional relationships and 86.1% from MontyTagger compared to 81.8% from MedPost on nondirectional relationships. This is unexpected as the potential for poor POS tagging by MontyTagger is likely to affect the outcome of the information extraction. An analysis of POS tagging errors demonstrated that 78.5% of tagging errors are being compensated by shallow parsing. Thus, despite 83.1% tagging accuracy, MontyTagger has a functional tagging accuracy of 94.6%. Conclusions: The POS tagging error does not adversely affect the information extraction task if the errors were resolved in shallow parsing through alternative POS tag use.
Tabular Parsing  [PDF]
Mark-Jan Nederhof,Giorgio Satta
Computer Science , 2004,
Abstract: This is a tutorial on tabular parsing, on the basis of tabulation of nondeterministic push-down automata. Discussed are Earley's algorithm, the Cocke-Kasami-Younger algorithm, tabular LR parsing, the construction of parse trees, and further issues.
Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text  [PDF]
Maurice HT Ling,Christophe Lefevre,Kevin R. Nicholas
Computer Science , 2008,
Abstract: A recent study reported development of Muscorian, a generic text processing tool for extracting protein-protein interactions from text that achieved comparable performance to biomedical-specific text processing tools. This result was unexpected since potential errors from a series of text analysis processes is likely to adversely affect the outcome of the entire process. Most biomedical entity relationship extraction tools have used biomedical-specific parts-of-speech (POS) tagger as errors in POS tagging and are likely to affect subsequent semantic analysis of the text, such as shallow parsing. This study aims to evaluate the parts-of-speech (POS) tagging accuracy and attempts to explore whether a comparable performance is obtained when a generic POS tagger, MontyTagger, was used in place of MedPost, a tagger trained in biomedical text. Our results demonstrated that MontyTagger, Muscorian's POS tagger, has a POS tagging accuracy of 83.1% when tested on biomedical text. Replacing MontyTagger with MedPost did not result in a significant improvement in entity relationship extraction from text; precision of 55.6% from MontyTagger versus 56.8% from MedPost on directional relationships and 86.1% from MontyTagger compared to 81.8% from MedPost on nondirectional relationships. This is unexpected as the potential for poor POS tagging by MontyTagger is likely to affect the outcome of the information extraction. An analysis of POS tagging errors demonstrated that 78.5% of tagging errors are being compensated by shallow parsing. Thus, despite 83.1% tagging accuracy, MontyTagger has a functional tagging accuracy of 94.6%.
Parsing as Reduction  [PDF]
Daniel Fernández-González,André F. T. Martins
Computer Science , 2015,
Abstract: We reduce phrase-representation parsing to dependency parsing. Our reduction is grounded on a new intermediate representation, "head-ordered dependency trees", shown to be isomorphic to constituent trees. By encoding order information in the dependency labels, we show that any off-the-shelf, trainable dependency parser can be used to produce constituents. When this parser is non-projective, we can perform discontinuous parsing in a very natural manner. Despite the simplicity of our approach, experiments show that the resulting parsers are on par with strong baselines, such as the Berkeley parser for English and the best single system in the SPMRL-2014 shared task. Results are particularly striking for discontinuous parsing of German, where we surpass the current state of the art by a wide margin.
GPE-entity Recognition Based on Conditional Random Fields
基于条件随机场的英文地理行政实体识别*

Zong Ping,Shi Shuicai,Wang Tao,Lv Xueqiang,
,style="background-color:,宗萍" target="_blank">#ffffff">宗萍&searchField=authors">宗萍&prev_q=#ffffff">宗萍" target="_blank">#ffffff">宗萍,施水才,王涛,吕学强

现代图书情报技术 , 2009,
Abstract: This paper detects Geographical Political Entities (GPE) and it subtypes from the English corpus of Automatic Content Extraction (ACE) evaluation, based on Conditional Random Fields (CRFs). A feature set is extracted from the ACE corpus, and contributions of different feature sets to the detection of GPE entities are evaluated in the experiments. The results show that the feature set extracted in this paper can get higher rate of recall and accuracy.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.