oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
A technical study and analysis on fuzzy similarity based models for text classification  [PDF]
Shalini Puri,Sona Kaushik
Computer Science , 2012, DOI: 10.5121/ijdkp.2012.2201
Abstract: In this new and current era of technology, advancements and techniques, efficient and effective text document classification is becoming a challenging and highly required area to capably categorize text documents into mutually exclusive categories. Fuzzy similarity provides a way to find the similarity of features among various documents. In this paper, a technical review on various fuzzy similarity based models is given. These models are discussed and compared to frame out their use and necessity. A tour of different methodologies is provided which is based upon fuzzy similarity related concerns. It shows that how text and web documents are categorized efficiently into different categories. Various experimental results of these models are also discussed. The technical comparisons among each model's parameters are shown in the form of a 3-D chart. Such study and technical review provide a strong base of research work done on fuzzy similarity based text document categorization.
A Technical Study and Analysis on Fuzzy Similarity Based Models For Text Classification  [PDF]
Shalini Puri,Sona Kaushik
International Journal of Data Mining & Knowledge Management Process , 2012,
Abstract: In this new and current era of technology, advancements and techniques, efficient and effective text document classification is becoming a challenging and highly required area to capably categorize text documents into mutually exclusive categories. Fuzzy similarity provides a way to find the similarity of features among various documents. In this paper, a technical review on various fuzzy similarity basedmodels is given. These models are discussed and compared to frame out their use and necessity. A tour of different methodologies is provided which is based upon fuzzy similarity related concerns. It shows that how text and web documents are categorized efficiently into different categories. Various experimentalresults of these models are also discussed. The technical comparisons among each model’s parameters are shown in the form of a 3-D chart. Such study and technical review provide a strong base of research work done on fuzzy similarity based text document categorization.
A Fuzzy Similarity Based Concept Mining Model for Text Classification  [PDF]
Shalini Puri
International Journal of Advanced Computer Sciences and Applications , 2011,
Abstract: Text Classification is a challenging and a red hot field in the current scenario and has great importance in text categorization applications. A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification algorithms. In this paper, a new Fuzzy Similarity Based Concept Mining Model (FSCMM) is proposed to classify a set of text documents into pre - defined Category Groups (CG) by providing them training and preparing on the sentence, document and integrated corpora levels along with feature reduction, ambiguity removal on each level to achieve high system performance. Fuzzy Feature Category Similarity Analyzer (FFCSA) is used to analyze each extracted feature of Integrated Corpora Feature Vector (ICFV) with the corresponding categories or classes. This model uses Support Vector Machine Classifier (SVMC) to classify correctly the training data patterns into two groups; i. e., + 1 and – 1, thereby producing accurate and correct results. The proposed model works efficiently and effectively with great performance and high - accuracy results.
A Fuzzy Similarity Based Concept Mining Model for Text Classification  [PDF]
Shalini Puri
Computer Science , 2012,
Abstract: Text Classification is a challenging and a red hot field in the current scenario and has great importance in text categorization applications. A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification algorithms. In this paper, a new Fuzzy Similarity Based Concept Mining Model (FSCMM) is proposed to classify a set of text documents into pre - defined Category Groups (CG) by providing them training and preparing on the sentence, document and integrated corpora levels along with feature reduction, ambiguity removal on each level to achieve high system performance. Fuzzy Feature Category Similarity Analyzer (FFCSA) is used to analyze each extracted feature of Integrated Corpora Feature Vector (ICFV) with the corresponding categories or classes. This model uses Support Vector Machine Classifier (SVMC) to classify correctly the training data patterns into two groups; i. e., + 1 and - 1, thereby producing accurate and correct results. The proposed model works efficiently and effectively with great performance and high - accuracy results.
基于中文自由文本击键特征的自动欺骗检测模型
Automatic deception detection model based on keystroke features of Chinese free text
 [PDF]

徐鸿雁,靳亮,林涛,彭舰
- , 2017,
Abstract: 研究表明欺骗行为在一定程度上会影响用户击键模式的变化。在互联网社交应用领域,通过击键特征对欺骗行为的检测对网络信息安全建设具有重要意义。然而,现有的欺骗行为检测模型侵入性强,实时性差等问题,限制了其在互联网社交应用领域的应用。针对以上问题,本研究设计了一个实验从短文本中收集了广泛的用户击键特征(单键特征、内容特征、双键特征),分别采用遗传算法(GA)和支撑向量机(SVM)完成特征选择和模型建立,开发出一个用以预测用户欺骗行为的模型(GA-SVM)。研究结果表明:该模型能够有效地检测出用户的欺骗行为,获得82.86%的分类准确率;三类击键特征对欺骗行为的检测都有贡献。此外,欺骗者认知负荷和心理压力对击键模式影响也被探讨。
Research has found that human’s deceptive behaviors would affect their keystroke patterns. Detecting deceptive behaviors through keystroke patterns is a critical step toward building a cyber information security system in the field of social networking. However, the existing models detecting deceptive behaviors still suffered from the problems of high invasion and low real-time performance. To solve the problems, we first designed an experiment to collect a wide range of stroke features (i.e., single-key features, content features and double-key features) from users’ typing process of short text and then developed a predictive model to detect the deceptive behaviors by using Genetic Algorithms (GAs) and Support Vector Machines (SVMs) as feature selection and model building methods, respectively. The results showed that the developed model could effectively detect the deceptive behaviors with accuracy of 82.86%; all the three categories of keystroke features had contributions to detecting deceptive behaviors. In addition, the effects of cognitive workload and pressure on keystroke pattern of deceivers had also been explored
Similarity-Based Techniques for Text Document Classification
S. Senthamarai Kannan,N. Ramaraj
International Journal of Soft Computing , 2012,
Abstract: With large scale text classification labeling a large number of documents for training poses a considerable burden on human experts who need to read each document and assign it to appropriate categories. With this problem in mind, our goal was to develop a text categorization system that uses fewer labeled examples for training to achieve a given level of performance using a similarity-based learning algorithm and thresholding strategies. Experimental results show that the proposed model is quite useful to build document categorization systems. This has been designed for a small level implementation considering the size of the corpus being used. This can be enhanced for a larger data set and the efficiency can be proved against the performance of the presently available methods like SVM, naive bayes etc. This approach on the whole concentrates on categorizing small level documents and does the assigned task with completeness.
Text Segmentation Based on Similarity between Words  [PDF]
Hideki Kozima
Computer Science , 1996,
Abstract: This paper proposes a new indicator of text structure, called the lexical cohesion profile (LCP), which locates segment boundaries in a text. A text segment is a coherent scene; the words in a segment are linked together via lexical cohesion relations. LCP records mutual similarity of words in a sequence of text. The similarity of words, which represents their cohesiveness, is computed using a semantic network. Comparison with the text segments marked by a number of subjects shows that LCP closely correlates with the human judgments. LCP may provide valuable information for resolving anaphora and ellipsis.
Similarity-based Text Recognition by Deeply Supervised Siamese Network  [PDF]
Ehsan Hosseini-Asl,Angshuman Guha
Computer Science , 2015,
Abstract: In this paper, we propose a new text recognition model based on measuring the visual similarity of text and predicting the content of unlabeled texts. First a Siamese network is trained with deep supervision on a labeled training dataset. This network projects texts into a similarity manifold. The Deeply Supervised Siamese network learns visual similarity of texts. Then a K-nearest neighbor classifier is used to predict unlabeled text based on similarity distance to labeled texts. The performance of the model is evaluated on three datasets of machine-print and hand-written text combined. We demonstrate that the model reduces the cost of human estimation by $50\%-85\%$. The error of the system is less than $0.5\%$. The results also demonstrate that the predicted labels are sometimes better than human labels e.g. spelling correction.
Research and Implementation of Text Similarity System Based on Power Spectrum Analysis  [PDF]
Ying Xie, Shouning Qu, Huanhuan Song
Journal of Computer and Communications (JCC) , 2014, DOI: 10.4236/jcc.2014.26002
Abstract:

The paper proposed the research and implement of text similarity system based on power spectrum analysis. It is not difficult to imagine that the signals of brain are closely linked with writing process. So we build text modeling and set pulse signal function to get the power spectrum of the text. The specific detail is getting power spectrum from economic field to build spectral library, and then using the method of power spectrum matching algorithm to judge whether the test text belonged to the economic field. The method made text similarity system finish the function of text intelligent classification efficiently and accurately.

BLOGRANK: Ranking Weblogs Based On Connectivity And Similarity Features  [PDF]
A. Kritikopoulos,M. Sideri,I. Varlamis
Computer Science , 2009,
Abstract: A large part of the hidden web resides in weblog servers. New content is produced in a daily basis and the work of traditional search engines turns to be insufficient due to the nature of weblogs. This work summarizes the structure of the blogosphere and highlights the special features of weblogs. In this paper we present a method for ranking weblogs based on the link graph and on several similarity characteristics between weblogs. First we create an enhanced graph of connected weblogs and add new types of edges and weights utilising many weblog features. Then, we assign a ranking to each weblog using our algorithm, BlogRank, which is a modified version of PageRank. For the validation of our method we run experiments on a weblog dataset, which we process and adapt to our search engine. (http://spiderwave.aueb.gr/Blogwave). The results suggest that the use of the enhanced graph and the BlogRank algorithm is preferred by the users.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.