Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
An efficient Support Vector Clustering with combined core outlier and boundary value for pre-processing of data
Deepak Kumar Vishwakarma, Anurag Jain
International Journal of Advanced Computer Research , 2012,
Abstract: The performance of support vector clustering suffered Due to noisy data. The pre-processing of data play important role in support vector cluster. In support vector clustering the mapping of data from one sphere to another sphere found some unwanted behaviour of data, these behaviour are boundary point, core and outlier. These data point degrade the performance and efficiency of support vector clustering. For the reduction of core, outlier and boundary value, we combined all dissimilar data and form COB model and data passes through genetic algorithm for collective collection of COB and reduce the COB value in data pre-processing phase. After reduction of COB support vector clustering are applied. Our empirical evaluation shows that our method is better than incremental support vector clustering and SSN-SVC.
Angle Decrement Based Gaussian Kernel Width Generator for Support Vector Clustering
M. Rahmat Widyanto,Herman Hartono
Asian Journal of Information Technology , 2012,
Abstract: A new method to generate Gaussian kernel width parameter (q) for Support Vector Clustering (SVC) is proposed in this study. The proposed method is based on idea of decreasing angle, along with increment of q. This method is a modification of secant method that previously proposed. Experiments are performed using four sets of data, each data set has its own characteristics. Experimental results show that angle decrement based method can generates a valid sequence of q value with simpler computation than secant method. In general, angle decrement based method can improve the performance of SVC so that clustering process can be performed faster.
Polynomial Kernel Based Structural Clustering Algorithm by Building Directed Trees

DING Jun-Di,MA Ru-Ning,CHEN Song-Can,

软件学报 , 2008,
Abstract: Within the internal organization of the data, the data points respectively play three different structural roles: the hub, centroid and outlier. The neighborhood-based density factor (NDF) used in the neighborhood based clustering (NBC) algorithm has the ability of identifying which points act as hubs, centriods or outliers in separated-well data set. However, NDF often works poorly in the circumstances of noise and overlapping. This paper introduces a polynomial kernel based neighborhood density factor (PKNDF) to address this issue. Relying on the PKNDF, a structural data clustering algorithm is further presented which can find all salient clusters with arbitrary shapes and unbalanced sizes in a noisy or overlapping data set. It builds clusters into the framework of directed trees in graph theory and thereby each point is scanned only once in the process of clustering. Hence, its computational complexity is nearly linear in the size of the input data. Experimental results on both synthetic and real-world datasets have demonstrated its effectiveness and efficiency.
International Journal of Innovative Research in Computer and Communication Engineering , 2013,
Abstract: Data mining is a process of extracting hidden and useful information from the data and the knowledge discovered by data mining is previously unknown, potentially useful, and valid and of high quality. There are several techniques exist for data extraction. Clustering is one of the techniques amongst them. In clustering technique, we form the group of similar objects (similarity in terms of distance or there may be any other factor). Outlier detection as a branch of data mining has many important applications and deserves more attention from data mining community. Therefore, it is important to detect outlier from the extracted data. There are so many techniques existing to detect outlier but Clustering is one of the efficient techniques. In this paper, I have compared the result of different Clustering techniques in terms of time complexity and proposed a new solution by adding fuzziness to already existing Clustering techniques.
Outlier robust system identification: a Bayesian kernel-based approach  [PDF]
Giulio Bottegal,Aleksandr Y. Aravkin,Hakan Hjalmarsson,Gianluigi Pillonetto
Statistics , 2013,
Abstract: In this paper, we propose an outlier-robust regularized kernel-based method for linear system identification. The unknown impulse response is modeled as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. To build robustness to outliers, we model the measurement noise as realizations of independent Laplacian random variables. The identification problem is cast in a Bayesian framework, and solved by a new Markov Chain Monte Carlo (MCMC) scheme. In particular, exploiting the representation of the Laplacian random variables as scale mixtures of Gaussians, we design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods.
Automatic PAM Clustering Algorithm for Outlier Detection  [cached]
Dajiang Lei,Qingsheng Zhu,Jun Chen,Hai Lin
Journal of Software , 2012, DOI: 10.4304/jsw.7.5.1045-1051
Abstract: In this paper, we propose an automatic PAM (Partition Around Medoids) clustering algorithm for outlier detection. The proposed methodology comprises two phases, clustering and finding outlying score. During clustering phase we automatically determine the number of clusters by combining PAM clustering algorithm and a specific cluster validation metric, which is vital to find a clustering solution that best fits the given data set, especially for PAM clustering algorithm. During finding outlier scores phase we decide outlying score of data instance corresponding to the cluster structure. Experiments on different datasets show that the proposed algorithm has higher detection rate go with lower false alarm rate comparing with the state of art outlier detection techniques, and it can be an effective solution for detecting outliers.
Algorithm of spatial outlier mining based on MST clustering

LIN Jiaxiang,CHEN Chongcheng,FAN Minghui,ZHENG Minqi,

地球信息科学 , 2008,
Abstract: A spatial outlier is a spatial object whose non-spatial attribute values are significantly deviated from the other data's in the dataset.How to detect spatial outliers from spatial dataset and to explain the reason causes the anomaly in practical application have become more and more interesting to many researchers.Spatial outliers mining can bring us a lot of interesting information,but for the complicated characteristic of spatial data,such as topological relation,orientation relation,measurement relation,and so on,traditional algorithms for outlier mining in business database seem to deficient in spatial dataset,the main problem lies in the difficulty to maintain spatial structure characteristics for most existing algorithms during the process of outlier mining.Thanks to the similarities between clustering and outlier mining,clustering based outlier mining is an important way to detect anomalies from dataset.However,due to the diversity of clustering algorithms,it is difficult to choose a proper one for outlier mining,and the main purpose of clustering is to find out the principal features of the dataset,outliers are the by-products of clustering.Based on minimum spanning tree clustering,a new algorithm for spatial outlier mining called SOM is proposed.The algorithm keeps basic spatial structure characteristics of spatial objects through the use of geometric structure: Delaunay triangulated irregular network and minimum spanning tree(MST),and it gains MST clustering by cutting off several most inconsistent edges of MST,so that it not only owns the function that it can acquire clusters from non-spherical and unbalanced datasets as the density-based cluster algorithms does,but also has the advantage that it doesn't depend on user's pre-set parameters,so the clustering result is usually more reasonable.Finally,the validity of SOM algorithm is validated by real application of geochemical soil elements dataset inspected to coastal areas of Fujian province,through analysis it is found that the algorithm is also applicable for spatial outlier mining in massive spatial dataset.
Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
Emma J Cooke, Richard S Savage, Paul DW Kirk, Robert Darkins, David L Wild
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-399
Abstract: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles.By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http:/ / www.bioconductor.org/ packages/ release/ bioc/ html/ BHC.html?pagewanted=all webcite.Post-genomic molecular biology has resulted in an explosion of typically high dimensional, structured data from new technologies for transcriptomics, proteomics and metabolomics. Often this data measures readouts from large sets of genes, proteins or metabolites over a time course rather than at a single time point. Most biological time series aim to capture information about processes which vary over time, and temporal changes in the transcription p
An outlier map for Support Vector Machine classification  [PDF]
Michiel Debruyne
Statistics , 2010, DOI: 10.1214/09-AOAS256
Abstract: Support Vector Machines are a widely used classification technique. They are computationally efficient and provide excellent predictions even for high-dimensional data. Moreover, Support Vector Machines are very flexible due to the incorporation of kernel functions. The latter allow to model nonlinearity, but also to deal with nonnumerical data such as protein strings. However, Support Vector Machines can suffer a lot from unclean data containing, for example, outliers or mislabeled observations. Although several outlier detection schemes have been proposed in the literature, the selection of outliers versus nonoutliers is often rather ad hoc and does not provide much insight in the data. In robust multivariate statistics outlier maps are quite popular tools to assess the quality of data under consideration. They provide a visual representation of the data depicting several types of outliers. This paper proposes an outlier map designed for Support Vector Machine classification. The Stahel--Donoho outlyingness measure from multivariate statistics is extended to an arbitrary kernel space. A trimmed version of Support Vector Machines is defined trimming part of the samples with largest outlyingness. Based on this classifier, an outlier map is constructed visualizing data in any type of high-dimensional kernel space. The outlier map is illustrated on 4 biological examples showing its use in exploratory data analysis.
Robust Clustering Using Outlier-Sparsity Regularization  [PDF]
Pedro A. Forero,Vassilis Kekatos,Georgios B. Giannakis
Computer Science , 2011, DOI: 10.1109/TSP.2012.2196696
Abstract: Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the ability of these algorithms to identify meaningful hidden structures rendering their outcome unreliable. This paper develops robust clustering algorithms that not only aim to cluster the data, but also to identify the outliers. The novel approaches rely on the infrequent presence of outliers in the data which translates to sparsity in a judiciously chosen domain. Capitalizing on the sparsity in the outlier domain, outlier-aware robust K-means and probabilistic clustering approaches are proposed. Their novelty lies on identifying outliers while effecting sparsity in the outlier domain through carefully chosen regularization. A block coordinate descent approach is developed to obtain iterative algorithms with convergence guarantees and small excess computational complexity with respect to their non-robust counterparts. Kernelized versions of the robust clustering algorithms are also developed to efficiently handle high-dimensional data, identify nonlinearly separable clusters, or even cluster objects that are not represented by vectors. Numerical tests on both synthetic and real datasets validate the performance and applicability of the novel algorithms.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.