oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2020 ( 52 )

2019 ( 227 )

2018 ( 241 )

2017 ( 242 )

Custom range...

Search Results: 1 - 10 of 144464 matches for " B. Grefenstette "
All listed articles are free for downloading (OA Articles)
Page 1 /144464
Display every page Item
Category-Theoretic Quantitative Compositional Distributional Models of Natural Language Semantics
Edward Grefenstette
Mathematics , 2013,
Abstract: This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produce representations for larger units of text by composing the representations of smaller units of text. This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations. This thesis shows how this approach can be theoretically extended and practically implemented to produce concrete compositional distributional models of natural language semantics. It furthermore demonstrates that such models can perform on par with, or better than, other competing approaches in the field of natural language processing. There are three principal contributions to computational linguistics in this thesis. The first is to extend the DisCoCat framework on the syntactic front and semantic front, incorporating a number of syntactic analysis formalisms and providing learning procedures allowing for the generation of concrete compositional distributional models. The second contribution is to evaluate the models developed from the procedures presented here, showing that they outperform other compositional distributional models present in the literature. The third contribution is to show how using category theory to solve linguistic problems forms a sound basis for research, illustrated by examples of work on this topic, that also suggest directions for future research.
Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors
Edward Grefenstette
Mathematics , 2013,
Abstract: The development of compositional distributional models of semantics reconciling the empirical aspects of distributional semantics with the compositional aspects of formal semantics is a popular topic in the contemporary literature. This paper seeks to bring this reconciliation one step further by showing how the mathematical constructs commonly used in compositional distributional models, such as tensors and matrices, can be used to simulate different aspects of predicate logic. This paper discusses how the canonical isomorphism between tensors and multilinear maps can be exploited to simulate a full-blown quantifier-free predicate calculus using tensors. It provides tensor interpretations of the set of logical connectives required to model propositional calculi. It suggests a variant of these tensor calculi capable of modelling quantifiers, using few non-linear operations. It finally discusses the relation between these variants, and how this relation should constitute the subject of future work.
INRIASAC: Simple Hypernym Extraction Methods
Gregory Grefenstette
Computer Science , 2015,
Abstract: Given a set of terms from a given domain, how can we structure them into a taxonomy without manual intervention? This is the task 17 of SemEval 2015. Here we present our simple taxonomy structuring techniques which, despite their simplicity, ranked first in this 2015 benchmark. We use large quantities of text (English Wikipedia) and simple heuristics such as term overlap and document and sentence co-occurrence to produce hypernym lists. We describe these techniques and pre-sent an initial evaluation of results.
Observational Constraints of Stellar Collapse: Diagnostic Probes of Nature's Extreme Matter Experiment
C. L. Fryer,W. Even,B. W. Grefenstette,T. -W. Wong
Physics , 2014,
Abstract: Supernovae are Nature's high-energy, high density laboratory experiments, reaching densities in excess of nuclear densities and temperatures above 10MeV. Astronomers have built up a suite of diagnostics to study these supernovae. If we can utilize these diagnostics, and tie them together with a theoretical understanding of supernova physics, we can use these cosmic explosions to study the nature of matter at these extreme densities and temperatures. Capitalizing on these diagnostics will require understanding a wide range of additional physics. Here we review the diagnostics and the physics needed to use them to learn about the supernova engine, and ultimate nuclear physics.
Experimenting with Transitive Verbs in a DisCoCat
Edward Grefenstette,Mehrnoosh Sadrzadeh
Mathematics , 2011,
Abstract: Formal and distributional semantic models offer complementary benefits in modeling meaning. The categorical compositional distributional (DisCoCat) model of meaning of Coecke et al. (arXiv:1003.4394v1 [cs.CL]) combines aspected of both to provide a general framework in which meanings of words, obtained distributionally, are composed using methods from the logical setting to form sentence meaning. Concrete consequences of this general abstract setting and applications to empirical data are under active study (Grefenstette et al., arxiv:1101.0309; Grefenstette and Sadrzadeh, arXiv:1106.4058v1 [cs.CL]). . In this paper, we extend this study by examining transitive verbs, represented as matrices in a DisCoCat. We discuss three ways of constructing such matrices, and evaluate each method in a disambiguation task developed by Grefenstette and Sadrzadeh (arXiv:1106.4058v1 [cs.CL]).
Experimental Support for a Categorical Compositional Distributional Model of Meaning
Edward Grefenstette,Mehrnoosh Sadrzadeh
Mathematics , 2011,
Abstract: Modelling compositional meaning for sentences using empirical distributional methods has been a challenge for computational linguists. We implement the abstract categorical model of Coecke et al. (arXiv:1003.4394v1 [cs.CL]) using data from the BNC and evaluate it. The implementation is based on unsupervised learning of matrices for relational words and applying them to the vectors of their arguments. The evaluation is based on the word disambiguation task developed by Mitchell and Lapata (2008) for intransitive sentences, and on a similar new experiment designed for transitive sentences. Our model matches the results of its competitors in the first experiment, and betters them in the second. The general improvement in results with increase in syntactic complexity showcases the compositional power of our model.
A Compositional Distributional Semantics, Two Concrete Constructions, and some Experimental Evaluations
Mehrnoosh Sadrzadeh,Edward Grefenstette
Mathematics , 2011,
Abstract: We provide an overview of the hybrid compositional distributional model of meaning, developed in Coecke et al. (arXiv:1003.4394v1 [cs.CL]), which is based on the categorical methods also applied to the analysis of information flow in quantum protocols. The mathematical setting stipulates that the meaning of a sentence is a linear function of the tensor products of the meanings of its words. We provide concrete constructions for this definition and present techniques to build vector spaces for meaning vectors of words, as well as that of sentences. The applicability of these methods is demonstrated via a toy vector space as well as real data from the British National Corpus and two disambiguation experiments.
Transforming Wikipedia into a Search Engine for Local Experts
Gregory Grefenstette,Karima Rafes
Computer Science , 2015,
Abstract: Finding experts for a given problem is recognized as a difficult task. Even when a taxonomy of subject expertise exists, and is associated with a group of experts, it can be hard to exploit by users who have not internalized the taxonomy. Here we present a method for both attaching experts to a domain ontology, and hiding this fact from the end user looking for an expert. By linking Wikipedia to this same pivot ontology, we describe how a user can browse Wikipedia, as they normally do to search for information, and use this browsing behavior to find experts. Experts are characterized by their textual productions (webpages, publications, reports), and these textual productions are attached to concepts in the pivot ontology. When the user finds the Wikipedia page characterizing their need, a list of experts is displayed. In this way we transform Wikipedia into a search engine for experts.
Corpus-based Method for Automatic Identification of Support Verbs for Nominalizations
Gregory Grefenstette,Simone Teufel
Computer Science , 1995,
Abstract: Nominalization is a highly productive phenomena in most languages. The process of nominalization ejects a verb from its syntactic role into a nominal position. The original verb is often replaced by a semantically emptied support verb (e.g., "make a proposal"). The choice of a support verb for a given nominalization is unpredictable, causing a problem for language learners as well as for natural language processing systems. We present here a method of discovering support verbs from an untagged corpus via low-level syntactic processing and comparison of arguments attached to verbal forms and potential nominalized forms. The result of the process is a list of potential support verbs for the nominalized form of a given predicate.
Estimation of English and non-English Language Use on the WWW
Gregory Grefenstette,Julien Nioche
Computer Science , 2000,
Abstract: The World Wide Web has grown so big, in such an anarchic fashion, that it is difficult to describe. One of the evident intrinsic characteristics of the World Wide Web is its multilinguality. Here, we present a technique for estimating the size of a language-specific corpus given the frequency of commonly occurring words in the corpus. We apply this technique to estimating the number of words available through Web browsers for given languages. Comparing data from 1996 to data from 1999 and 2000, we calculate the growth of a number of European languages on the Web. As expected, non-English languages are growing at a faster pace than English, though the position of English is still dominant.
Page 1 /144464
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.