oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Metadata for Name Disambiguation and Collocation  [PDF]
Jeffrey Beall
Future Internet , 2010, DOI: 10.3390/fi2010001
Abstract: Searching names of persons, families, and organizations is often difficult in online databases because different persons or organizations frequently share the same name and because a single person’s or organization’s name may appear in different forms in various online documents. Databases and search engines can use metadata as a tool to solve the problem of name ambiguity and name variation in online databases. This article describes the challenges names pose in information retrieval and some emerging name metadata databases that can help ameliorate the problems. Effective name disambiguation and collocation increase search precision and recall and can improve assessment of scholarly work.
Name Disambiguation Method Based on Attribute Match and Link Analysis  [PDF]
Yu-Feng Yao
Journal of Software Engineering and Applications (JSEA) , 2012, DOI: 10.4236/jsea.2012.51005
Abstract: A name disambiguation method is proposed based on attribute match and link analysis applying in the field of insurance. Aiming at the former name disambiguation methods such as text clustering method needs to be considered in a lot of useless words, a new name disambiguation method is advanced. Firstly, the same attribute matching is applied, merging the identity of a successful match, secondly, the link analysis is used, structural analysis of customers network is analyzed, Finally, the same cooperating information is merged. Experiment results show that the proposed method can realize name disambiguation successfully.
Author Name Disambiguation by Using Deep Neural Network  [PDF]
Hung Nghiep Tran,Tin Huynh,Tien Do
Computer Science , 2015, DOI: 10.1007/978-3-319-05476-6_13
Abstract: Author name ambiguity decreases the quality and reliability of information retrieved from digital libraries. Existing methods have tried to solve this problem by predefining a feature set based on expert's knowledge for a specific dataset. In this paper, we propose a new approach which uses deep neural network to learn features automatically from data. Additionally, we propose the general system architecture for author name disambiguation on any dataset. In this research, we evaluate the proposed method on a dataset containing Vietnamese author names. The results show that this method significantly outperforms other methods that use predefined feature set. The proposed method achieves 99.31% in terms of accuracy. Prediction error rate decreases from 1.83% to 0.69%, i.e., it decreases by 1.14%, or 62.3% relatively compared with other methods that use predefined feature set (Table 3).
Adaptive Resonance Theory Based Two-Stage Chinese Name Disambiguation  [PDF]
Xin Wang, Yuanchao Liu, Xiaolong Wang, Ming Liu, Bingquan Liu
International Journal of Intelligence Science (IJIS) , 2012, DOI: 10.4236/ijis.2012.24011
Abstract: It’s common that different individuals share the same name, which makes it time-consuming to search information of a particular individual on the web. Name disambiguation study is necessary to help users find the person of interest more readily. In this paper, we propose an Adaptive Resonance Theory (ART) based two-stage strategy for this problem. We get a first-stage clustering result with ART1 model and then merge similar clusters in the second stage. Our strategy is a mimic process of manual disambiguation and need not to predict the number of clusters, which makes it competent for the disambiguation task. Experimental results show that, in comparison with the agglomerative clustering method, our strategy improves the performance by respectively 0.92% and 5.00% on two kinds of name recognition results.
Accuracy of simple, initials-based methods for author name disambiguation  [PDF]
Sta?a Milojevi?
Computer Science , 2013, DOI: 10.1016/j.joi.2013.06.006
Abstract: There are a number of solutions that perform unsupervised name disambiguation based on the similarity of bibliographic records or common co-authorship patterns. Whether the use of these advanced methods, which are often difficult to implement, is warranted depends on whether the accuracy of the most basic disambiguation methods, which only use the author's last name and initials, is sufficient for a particular purpose. We derive realistic estimates for the accuracy of simple, initials-based methods using simulated bibliographic datasets in which the true identities of authors are known. Based on the simulations in five diverse disciplines we find that the first initial method already correctly identifies 97% of authors. An alternative simple method, which takes all initials into account, is typically two times less accurate, except in certain datasets that can be identified by applying a simple criterion. Finally, we introduce a new name-based method that combines the features of first initial and all initials methods by implicitly taking into account the last name frequency and the size of the dataset. This hybrid method reduces the fraction of incorrectly identified authors by 10-30% over the first initial method.
Name Disambiguation from link data in a collaboration graph  [PDF]
Baichuan Zhang,Tanay Kumar Saha,Mohammad Al Hasan
Computer Science , 2014,
Abstract: In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple person with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the timestamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.
The strength of co-authorship in gene name disambiguation
Richárd Farkas
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-69
Abstract: Our key hypothesis is that a biologist refers to each particular gene by a fixed gene alias and this holds for the co-authors as well. To make use of the co-authorship information we decided to build the inverse co-author graph on MedLine abstracts. The nodes of the inverse co-author graph are articles and there is an edge between two nodes if and only if the two articles have a mutual author. We introduce here two methods using distances (based on the graph) of abstracts for the GSD task. We found that a disambiguation decision can be made in 85% of cases with an extremely high (99.5%) precision rate just by using information obtained from the inverse co-author graph. We incorporated the co-authorship information into two GSD systems in order to attain full coverage and in experiments our procedure achieved precision of 94.3%, 98.85%, 96.05% and 99.63% on the human, mouse, fly and yeast GSD evaluation sets, respectively.Based on the promising results obtained so far we suggest that the co-authorship information and the circumstances of the articles' release (like the title of the journal, the year of publication) can be a crucial building block of any sophisticated similarity measure among biological articles and hence the methods introduced here should be useful for other biomedical natural language processing tasks (like organism or target disease detection) as well.Biological articles provide a huge amount of information about genes, proteins, their behaviour under different conditions, and their interactions. The handling of huge amounts of unstructured data (free text) has increased in interest along with the application of automatic Natural Language Processing (NLP) techniques to biomedical articles. Named Entity (NE) recognition is the first and crucial step of an Information Extraction (IE) system and a major building block of an Information Retrieval (IR) system as well.The task of biological entity recognition is to identify and classify gene, protein, ch
Distortive Effects of Initial-Based Name Disambiguation on Measurements of Large-Scale Coauthorship Networks  [PDF]
Jinseok Kim,Jana Diesner
Computer Science , 2015, DOI: 10.1002/asi.23489
Abstract: Scholars have often relied on name initials to resolve name ambiguities in large-scale coauthorship network research. This approach bears the risk of incorrectly merging or splitting author identities. The use of initial-based disambiguation has been justified by the assumption that such errors would not affect research findings too much. This paper tests this assumption by analyzing coauthorship networks from five academic fields - biology, computer science, nanoscience, neuroscience, and physics - and an interdisciplinary journal, PNAS. Name instances in datasets of this study were disambiguated based on heuristics gained from previous algorithmic disambiguation solutions. We use disambiguated data as a proxy of ground-truth to test the performance of three types of initial-based disambiguation. Our results show that initial-based disambiguation can misrepresent statistical properties of coauthorship networks: it deflates the number of unique authors, number of component, average shortest paths, clustering coefficient, and assortativity, while it inflates average productivity, density, average coauthor number per author, and largest component size. Also, on average, more than half of top 10 productive or collaborative authors drop off the lists. Asian names were found to account for the majority of misidentification by initial-based disambiguation due to their common surname and given name initials.
Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation  [PDF]
Konstantin Avrachenkov,Nelly Litvak,Danil A. Nemirovsky,Elena Smirnova,Marina Sokol
Computer Science , 2010,
Abstract: We study a problem of quick detection of top-k Personalized PageRank lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and name disambiguation. In particular, we apply our results to construct efficient algorithms for the person name disambiguation problem. We argue that when finding top-k Personalized PageRank lists two observations are important. Firstly, it is crucial that we detect fast the top-k most important neighbours of a node, while the exact order in the top-k list as well as the exact values of PageRank are by far not so crucial. Secondly, a little number of wrong elements in top-k lists do not really degrade the quality of top-k lists, but it can lead to significant computational saving. Based on these two key observations we propose Monte Carlo methods for fast detection of top-k Personalized PageRank lists. We provide performance evaluation of the proposed methods and supply stopping criteria. Then, we apply the methods to the person name disambiguation problem. The developed algorithm for the person name disambiguation problem has achieved the second place in the WePS 2010 competition.
A Link-based Method for Name Disambiguation and its Application
基于链接的作者重名处理方法研究与应用

WU Bin XU Chao-Qun WANG Wen-Bin WU Wei,
吴斌
,徐超群,王文彬,吴巍

计算机科学 , 2008,
Abstract: The research presents an algorithm that is applicable to name disambiguation of Chinese literature digital library. Differ from the clustering method which considered the node attribute and link structure simultaneity,differ from the state-of-the-art LDA-ER method which employ the LDA model to resolute entities,differ from the DistQC model for resolving name disambiguation,our dedicated algorithm firstly makes an attribute similarity analysis and then detects the reference-entity relationship by considering...
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.