A Query Language for Multi-version Data Web Archives  [PDF]
Marios Meimaris,George Papastefanatos,Stratis Viglas,Yannis Stavrakas,Christos Pateritsas,Ioannis Anagnostopoulos
Computer Science , 2015,
Abstract: The Data Web refers to the vast and rapidly increasing quantity of scientific, corporate, government and crowd-sourced data published in the form of Linked Open Data, which encourages the uniform representation of heterogeneous data items on the web and the creation of links between them. The growing availability of open linked datasets has brought forth significant new challenges regarding their proper preservation and the management of evolving information within them. In this paper, we focus on the evolution and preservation challenges related to publishing and preserving evolving linked data across time. We discuss the main problems regarding their proper modelling and querying and provide a conceptual model and a query language for modelling and retrieving evolving data along with changes affecting them. We present in details the syntax of the query language and demonstrate its functionality over a real-world use case of evolving linked dataset from the biological domain.
Word Disambiguation in Web Search
Rekha Jain,G.N. Purohit
International Journal on Computer Science and Engineering , 2012,
Abstract: Internet is huge like a sea as the amount of information is growing rapidly on WEB. Whenever user searches something on Internet the Search Engine provides an incredible amount of information thatincreases the complexity of dealing with information. Various algorithms have been developed that help the user to retrieve the web contents. Sometimes these algorithms do not give fruitful results especially in case of Homographs. In this paper authors discuss a disambiguation algorithm that is used for information retrieval in web search. The experimental study is done to find out whether the approach provides better results.
Normalized Web Distance and Word Similarity  [PDF]
Rudi L. Cilibrasi,Paul M. B. Vitanyi
Computer Science , 2009,
Abstract: There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. It
Colourful Language: Measuring Word-Colour Associations  [PDF]
Saif Mohammad
Computer Science , 2013,
Abstract: Since many real-world concepts are associated with colour, for example danger with red, linguistic information is often complimented with the use of appropriate colours in information visualization and product marketing. Yet, there is no comprehensive resource that captures concept-colour associations. We present a method to create a large word-colour association lexicon by crowdsourcing. We focus especially on abstract concepts and emotions to show that even though they cannot be physically visualized, they too tend to have strong colour associations. Finally, we show how word-colour associations manifest themselves in language, and quantify usefulness of co-occurrence and polarity cues in automatically detecting colour associations.
Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities  [PDF]
Peter D. Turney
Computer Science , 2004,
Abstract: This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill's rule-based part-of-speech tagger. Head words are represented as feature vectors with several hundred features. Approximately half of the features are syntactic and the other half are semantic. The main novelty in the system is the method for generating the semantic features, based on word \hbox{co-occurrence} probabilities. The probabilities are estimated using the Waterloo MultiText System with a corpus of about one terabyte of unlabeled text, collected by a web crawler.
Two-parameter Model of Word Length "Language - Genre"  [PDF]
Victor Kromer
Computer Science , 2001,
Abstract: A two-parameter model of word length measured by the number of syllables comprising it is proposed. The first parameter is dependent on language type, the second one - on text genre and reflects the degree of completion of synergetic processes of language optimization.
Compositional Morphology for Word Representations and Language Modelling  [PDF]
Jan A. Botha,Phil Blunsom
Computer Science , 2014,
Abstract: This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. Our approach is evaluated in the context of log-bilinear language models, rendered suitably efficient for implementation inside a machine translation decoder by factoring the vocabulary. We perform both intrinsic and extrinsic evaluations, presenting results on a range of languages which demonstrate that our model learns morphological representations that both perform well on word similarity tasks and lead to substantial reductions in perplexity. When used for translation into morphologically rich languages with large vocabularies, our models obtain improvements of up to 1.2 BLEU points relative to a baseline system using back-off n-gram models.
A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites
Amsaveni.K, Vydehi.S
International Journal of Computer Trends and Technology , 2012,
Abstract: In this paper, we present a complete framework and findings in mining Web usage patterns from Web log files of a real Web site that has all the challenging aspects of real-life Web usage mining, including evolving user profiles and external data describing an ontology of the Web content. Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applicationations
Discussion on Word Creation Methods in WEB

Fang LuPing,Zhang WenBin,

计算机系统应用 , 2005,
Abstract: 随着互联网的迅速发展,基于WEB的交互方式已经变得越来越普及。因为操作的方便,人们已习惯使用IE或者其他浏览器通过访问网页的形式与信息管理服务器进行信息交互。同时Word也是很多用户经常采用的文字处理软件,很多时候我们需要将网页中的信息转换成Word格式来处理。本文将综合讨论在WEB环境中生成WORD的一些手段,以及它们各自的特点。
WebScript -- A Scripting Language for the Web  [PDF]
Yin Zhang
Computer Science , 1999,
Abstract: WebScript is a scripting language for processing Web documents. Designed as an extension to Jacl, the Java implementation of Tcl, WebScript allows programmers to manipulate HTML in the same way as Tcl manipulates text strings and GUI elements. This leads to a completely new way of writing the next generation of Web applications. This paper presents the motivation behind the design and implementation of WebScript, an overview of its major features, as well as some demonstrations of its power.
