Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
International Journal of Innovative Research in Computer and Communication Engineering , 2013,
Abstract: Data mining is a process of extracting hidden and useful information from the data and the knowledge discovered by data mining is previously unknown, potentially useful, and valid and of high quality. There are several techniques exist for data extraction. Clustering is one of the techniques amongst them. In clustering technique, we form the group of similar objects (similarity in terms of distance or there may be any other factor). Outlier detection as a branch of data mining has many important applications and deserves more attention from data mining community. Therefore, it is important to detect outlier from the extracted data. There are so many techniques existing to detect outlier but Clustering is one of the efficient techniques. In this paper, I have compared the result of different Clustering techniques in terms of time complexity and proposed a new solution by adding fuzziness to already existing Clustering techniques.
Outlier Detection using Projection Quantile Regression for Mass Spectrometry Data with Low Replication
Soo-Heang Eo, Daewoo Pak, Jeea Choi, HyungJun Cho
BMC Research Notes , 2012, DOI: 10.1186/1756-0500-5-236
Abstract: We proposed an outlier detection algorithm using projection and quantile regression in MS data from multiple experiments. The performance of the algorithm and program was demonstrated by using both simulated and real-life data. The projection approach with linear, nonlinear, or nonparametric quantile regression was appropriate in heterogeneous high-throughput data with low replication.Various quantile regression approaches combined with projection were proposed for detecting outliers. The choice among linear, nonlinear, and nonparametric regressions is dependent on the degree of heterogeneity of the data. The proposed approach was illustrated with MS data with two or more replicates.Mass spectrometry (MS) data are often generated from various biological or chemical experiments. Such vast data is usually analyzed automatically in a computer process consisting of pre-processing, significance test, classification, and clustering. Elaborate pre-processing is essential for successful analysis with reliable results. One pre-processing step is required to detect outliers, which which are extreme due to technical reasons. The plausible outlying observations detected can be examined carefully, and then corrected or eliminated if necessary. However, as the manual examination of all observations for outlier detection is time-consuming, plausible outlying observations must be detected automatically.Identification of statistical outliers is the subject of some controversy in statistics [1]. Several outlier detection algorithms have been proposed for univariate data, including Grubbs’ test [2] and Dixon’s Q test [3]. These tests were designed to analyze data under the normality assumption, so that they may produce unreliable outcomes in the case of few replicates. Furthermore, they are not applicable for duplicated samples. Another naive approach to detect outliers statistically constructs lower and upper fences of differences between two samples, Q1 - 1.5 IQR and Q3 + 1.5 IQR, w
Automatic PAM Clustering Algorithm for Outlier Detection  [cached]
Dajiang Lei,Qingsheng Zhu,Jun Chen,Hai Lin
Journal of Software , 2012, DOI: 10.4304/jsw.7.5.1045-1051
Abstract: In this paper, we propose an automatic PAM (Partition Around Medoids) clustering algorithm for outlier detection. The proposed methodology comprises two phases, clustering and finding outlying score. During clustering phase we automatically determine the number of clusters by combining PAM clustering algorithm and a specific cluster validation metric, which is vital to find a clustering solution that best fits the given data set, especially for PAM clustering algorithm. During finding outlier scores phase we decide outlying score of data instance corresponding to the cluster structure. Experiments on different datasets show that the proposed algorithm has higher detection rate go with lower false alarm rate comparing with the state of art outlier detection techniques, and it can be an effective solution for detecting outliers.
Algorithm of spatial outlier mining based on MST clustering

LIN Jiaxiang,CHEN Chongcheng,FAN Minghui,ZHENG Minqi,

地球信息科学 , 2008,
Abstract: A spatial outlier is a spatial object whose non-spatial attribute values are significantly deviated from the other data's in the dataset.How to detect spatial outliers from spatial dataset and to explain the reason causes the anomaly in practical application have become more and more interesting to many researchers.Spatial outliers mining can bring us a lot of interesting information,but for the complicated characteristic of spatial data,such as topological relation,orientation relation,measurement relation,and so on,traditional algorithms for outlier mining in business database seem to deficient in spatial dataset,the main problem lies in the difficulty to maintain spatial structure characteristics for most existing algorithms during the process of outlier mining.Thanks to the similarities between clustering and outlier mining,clustering based outlier mining is an important way to detect anomalies from dataset.However,due to the diversity of clustering algorithms,it is difficult to choose a proper one for outlier mining,and the main purpose of clustering is to find out the principal features of the dataset,outliers are the by-products of clustering.Based on minimum spanning tree clustering,a new algorithm for spatial outlier mining called SOM is proposed.The algorithm keeps basic spatial structure characteristics of spatial objects through the use of geometric structure: Delaunay triangulated irregular network and minimum spanning tree(MST),and it gains MST clustering by cutting off several most inconsistent edges of MST,so that it not only owns the function that it can acquire clusters from non-spherical and unbalanced datasets as the density-based cluster algorithms does,but also has the advantage that it doesn't depend on user's pre-set parameters,so the clustering result is usually more reasonable.Finally,the validity of SOM algorithm is validated by real application of geochemical soil elements dataset inspected to coastal areas of Fujian province,through analysis it is found that the algorithm is also applicable for spatial outlier mining in massive spatial dataset.
Robust Clustering Using Outlier-Sparsity Regularization  [PDF]
Pedro A. Forero,Vassilis Kekatos,Georgios B. Giannakis
Computer Science , 2011, DOI: 10.1109/TSP.2012.2196696
Abstract: Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the ability of these algorithms to identify meaningful hidden structures rendering their outcome unreliable. This paper develops robust clustering algorithms that not only aim to cluster the data, but also to identify the outliers. The novel approaches rely on the infrequent presence of outliers in the data which translates to sparsity in a judiciously chosen domain. Capitalizing on the sparsity in the outlier domain, outlier-aware robust K-means and probabilistic clustering approaches are proposed. Their novelty lies on identifying outliers while effecting sparsity in the outlier domain through carefully chosen regularization. A block coordinate descent approach is developed to obtain iterative algorithms with convergence guarantees and small excess computational complexity with respect to their non-robust counterparts. Kernelized versions of the robust clustering algorithms are also developed to efficiently handle high-dimensional data, identify nonlinearly separable clusters, or even cluster objects that are not represented by vectors. Numerical tests on both synthetic and real datasets validate the performance and applicability of the novel algorithms.
Integer Programming Relaxations for Integrated Clustering and Outlier Detection  [PDF]
Lionel Ott,Linsey Pang,Fabio Ramos,David Howe,Sanjay Chawla
Computer Science , 2014,
Abstract: In this paper we present methods for exemplar based clustering with outlier selection based on the facility location formulation. Given a distance function and the number of outliers to be found, the methods automatically determine the number of clusters and outliers. We formulate the problem as an integer program to which we present relaxations that allow for solutions that scale to large data sets. The advantages of combining clustering and outlier selection include: (i) the resulting clusters tend to be compact and semantically coherent (ii) the clusters are more robust against data perturbations and (iii) the outliers are contextualised by the clusters and more interpretable, i.e. it is easier to distinguish between outliers which are the result of data errors from those that may be indicative of a new pattern emergent in the data. We present and contrast three relaxations to the integer program formulation: (i) a linear programming formulation (LP) (ii) an extension of affinity propagation to outlier detection (APOC) and (iii) a Lagrangian duality based formulation (LD). Evaluation on synthetic as well as real data shows the quality and scalability of these different methods.
Improved Hybrid Clustering and Distance-based Technique for Outlier Removal
P. Murugavel,,Dr. M. Punithavalli
International Journal on Computer Science and Engineering , 2011,
Abstract: Outliers detection is a task that finds objects that are dissimilar or inconsistent with respect to the remaining data. It has many uses in applications like fraud detection, network intrusion detection andclinical diagnosis of diseases. Using clustering algorithms for outlier detection is a technique that is frequently used. The clustering algorithms consider outlier detection only to the point they do not interfere with the clustering process. In these algorithms, outliers are only by-products of clustering algorithms and they cannot rank the priority of outliers. In this paper, three partition-based algorithms, PAM, CLARA and CLARANs are combined with k-medoid distance based outlier detection to improve the outlier detection and removal process. The experimental results prove that CLARANS clustering algorithm when combined with medoid distance based outlier detection improves the accuracy of detection and increases the time efficiency.
Global outlier detection based on hierarchical clustering

LIANG Bin-mei,WEI Lin-n,SONG Qing-zhen,

计算机应用研究 , 2011,
Abstract: The existing outlier detection algorithms should be improved due to their versatility,effectiveness,user-friendliness,and the performance in processing high-dimensional and large databases.A fast and effective hierarchical clustering based global outlier detection approch is proposed in this paper. Agglomerative hierarchical clustering is performed firstly,and then the isolated degree of the data can be visually judged and the number of the outliers can be determined based on the clustering tree and the distance matrix.After that, the outliers is identified unsupervisedly from the top to down of the clustering tree.Experimental results show that,this approch can identify global outliers fastly and effectively,and is user-friendly and capable at datasets of various shapes.Experiments also illustrate that this approach is suitable for use on high-dimensional and large databases.
An Ensemble Method based on Particle of Swarm for the Reduction of Noise, Outlier and Core Point  [PDF]
Satish Dehariya,,Divakar Singh
International Journal of Advanced Computer Research , 2013,
Abstract: The majority voting and accurate prediction ofclassification algorithm in data mining arechallenging task for data classification. For theimprovement of data classification used differentclassifier along with another classifier in a mannerof ensembleprocess. Ensemble process increase theclassification ratio of classification algorithm, nowsuch par diagram of classification algorithm iscalled ensemble classifier. Ensemble learning is atechnique to improve the performance and accuracyof classification and predication of machinelearning algorithm. Many researchers proposed amodel for ensemble classifier for merging adifferent classification algorithm, but theperformance of ensemble algorithm suffered fromproblem of outlier, noise and core pointproblem ofdata from features selection process. In this paperwe combined core, outlier and noise data (COB) forfeatures selection process for ensemble model. Theprocess of best feature selection with appropriateclassifier used particle of swarm optimization.
Subtractive Clustering Based RBF Neural Network Model for Outlier Detection  [cached]
Peng Yang,Qingsheng Zhu,Xun Zhong
Journal of Computers , 2009, DOI: 10.4304/jcp.4.8.755-762
Abstract: Outlier detection has many important applications in the field of fraud detection, network robustness analysis and intrusion detection. Some researches have utilized the neural network to solve the problem because it has the advantage of powerful modeling ability. In this paper, we propose a RBF neural network model using subtractive clustering algorithm for selecting the hidden node centers, which can achieve faster training speed. In the meantime, the RBF network was trained with a regularization term so as to minimize the variances of the nodes in the hidden layer and perform more accurate prediction. By defining the degree of outlier, we can effectively find the abnormal data whose actual output is serious deviation from its expectation as long as the output is certainty. Experimental results on different datasets show that the proposed RBF model has higher detection rate as well as lower false positive rate comparing with the other methods, and it can be an effective solution for detecting outliers.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.