Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
An Effective Technique for Clustering Incremental Gene Expression data  [PDF]
Sauravjyoti Sarmah,Dhruba K. Bhattacharyya
International Journal of Computer Science Issues , 2010,
Abstract: This paper presents a clustering technique (GenClus) for gene expression data which can also handle incremental data. It is designed based on density based approach. It retains the regulation information which is also the main advantage of the clustering. It uses no proximity measures and is therefore free of the restrictions offered by them. GenClus is capable of handling datasets which are updated incrementally. Experimental results show the efficiency of GenClus in detecting quality clusters over gene expression data. Our approach improves the cluster quality by identifying sub-clusters within big clusters. It was compared with some well-known clustering algorithms and found to perform well in terms of the z-score cluster validity measure.
Tri6 Is a Global Transcription Regulator in the Phytopathogen Fusarium graminearum  [PDF]
Charles G. Nasmith equal contributor,Sean Walkowiak equal contributor,Li Wang equal contributor,Winnie W. Y. Leung,Yunchen Gong,Anne Johnston,Linda J. Harris,David S. Guttman,Rajagopal Subramaniam equal contributor
PLOS Pathogens , 2011, DOI: 10.1371/journal.ppat.1002266
Abstract: In F. graminearum, the transcriptional regulator Tri6 is encoded within the trichothecene gene cluster and regulates genes involved in the biosynthesis of the secondary metabolite deoxynivalenol (DON). The Tri6 protein with its Cys2His2 zinc-finger may also conform to the class of global transcription regulators. This class of global transcriptional regulators mediate various environmental cues and generally responds to the demands of cellular metabolism. To address this issue directly, we sought to find gene targets of Tri6 in F. graminearum grown in optimal nutrient conditions. Chromatin immunoprecipitation followed by Illumina sequencing (ChIP-Seq) revealed that in addition to identifying six genes within the trichothecene gene cluster, Tri1, Tri3, Tri6, Tri7, Tri12 and Tri14, the ChIP-Seq also identified 192 additional targets potentially regulated by Tri6. Functional classification revealed that, among the annotated genes, ~40% are associated with cellular metabolism and transport and the rest of the target genes fall into the category of signal transduction and gene expression regulation. ChIP-Seq data also revealed Tri6 has the highest affinity toward its own promoter, suggesting that this gene could be subject to self-regulation. Electro mobility shift assays (EMSA) performed on the promoter of Tri6 with purified Tri6 protein identified a minimum binding motif of GTGA repeats as a consensus sequence. Finally, expression profiling of F. graminearum grown under nitrogen-limiting conditions revealed that 49 out of 198 target genes are differentially regulated by Tri6. The identification of potential new targets together with deciphering novel binding sites for Tri6, casts new light into the role of this transcriptional regulator in the overall growth and development of F. graminearum.
Voting-based consensus clustering for combining multiple clusterings of chemical structures  [cached]
Saeed Faisal,Salim Naomie,Abdo Ammar
Journal of Cheminformatics , 2012, DOI: 10.1186/1758-2946-4-37
Abstract: Background Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Results The cumulative voting-based aggregation algorithm (CVAA), cluster-based similarity partitioning algorithm (CSPA) and hyper-graph partitioning algorithm (HGPA) were examined. The F-measure and Quality Partition Index method (QPI) were used to evaluate the clusterings and the results were compared to the Ward’s clustering method. The MDL Drug Data Report (MDDR) dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward’s method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward’s method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria. Conclusions The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA) was the method of choice among consensus clustering methods.
Tri-Training and Data Editing Based Semi-Supervised Clustering Algorithm

DENG Chao,GUO Mao-Zu,

软件学报 , 2008,
Abstract: In this paper, a algorithm named DE-Tri-training semi-supervised K-means is proposed, which could get a seeds set of larger scale and less noise. In detail, prior to using the seeds set to initialize cluster centroids, the training process of a semi-supervised classification approach named Tri-training is used to label unlabeled data and add them into the initial seeds set to enlarge the scale. Meanwhile, to improve the quality of the enlarged seeds set, a nearest neighbor rule based data editing technique named Depuration is introduced into Tri-training process to eliminate and correct the mislabeled noise data in the enlarged seeds. Experimental results show that the novel semi-supervised clustering algorithm could effectively improve the cluster centroids initialization and enhance clustering performance.
An Effective Clustering Algorithm With Ant Colony  [cached]
Xiao-yong Liu,Hui Fu
Journal of Computers , 2010, DOI: 10.4304/jcp.5.4.598-605
Abstract: This paper proposes a new clustering algorithm based on ant colony to solve the unsupervised clustering problem. Ant colony optimization (ACO) is a population-based meta-heuristic that can be used to find approximate solutions to difficult combinatorial optimization problems. Clustering Analysis, which is an important method in data mining, classifies a set of observations into two or more mutually exclusive unknown groups. This paper presents an effective clustering algorithm with ant colony which is based on stochastic best solution kept--ESacc. The algorithm is based on Sacc algorithm that was proposed by P.S.Shelokar. It’s mainly virtue that best values iteratively are kept stochastically. Moreover, the new algorithm using Jaccard index to identify the optimal cluster number. The results of several times experiments in three datasets show that the new algorithm-ESacc is less in running time, is better in clustering effect and more stable than Sacc. Experimental results validate the novel algorithm’s efficiency. In addition, Three indices of clustering validity analysis are selected and used to evaluate the clustering solutions of ESacc and Sacc.
RCHIG: An Effective Clustering Algorithm with Ranking  [cached]
Jianwen Tao
Journal of Software , 2009, DOI: 10.4304/jsw.4.4.382-389
Abstract: In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a heterogeneous information graph. A novel clustering framework called RCHIG is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RCHIG can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.
Effective Clustering Algorithms for Gene Expression Data  [PDF]
T. Chandrasekhar,K. Thangavel,E. Elayaraja
Computer Science , 2012,
Abstract: Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of co-expressed genes and coherent patterns is the central goal in microarray or gene expression data analysis and is an important task in Bioinformatics research. In this paper, K-Means algorithm hybridised with Cluster Centre Initialization Algorithm (CCIA) is proposed Gene Expression Data. The proposed algorithm overcomes the drawbacks of specifying the number of clusters in the K-Means methods. Experimental analysis shows that the proposed method performs well on gene Expression Data when compare with the traditional K- Means clustering and Silhouette Coefficients cluster measure.
Effective Term Based Text Clustering Algorithms
P. Ponmuthuramalingam,,T. Devi
International Journal on Computer Science and Engineering , 2010,
Abstract: Text clustering methods can be used to group large sets of text documents. Most of the text clustering methods do not address the problems of text clustering such as very high dimensionality of the data and understandability of the clustering descriptions. In this paper, a frequent term based approach of clustering has been introduced; it provides a natural way of reducing a large dimensionality of the document vector space. This approach is based on clustering the low dimensionality frequent term sets and not on clustering high dimensionality vector space. Four algorithms for effective term based text clustering has been presented. An experimental evaluation on classical text documents as well ason web documents demonstrates that the proposed algorithms obtain clustering of comparable quality significantly more efficient than existing text clustering algorithms.
Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data  [cached]
Sakellariou Argiris,Sanoudou Despina,Spyrou George
BMC Bioinformatics , 2012, DOI: 10.1186/1471-2105-13-270
Abstract: Background A feature selection method in microarray gene expression data should be independent of platform, disease and dataset size. Our hypothesis is that among the statistically significant ranked genes in a gene list, there should be clusters of genes that share similar biological functions related to the investigated disease. Thus, instead of keeping N top ranked genes, it would be more appropriate to define and keep a number of gene cluster exemplars. Results We propose a hybrid FS method (mAP-KL), which combines multiple hypothesis testing and affinity propagation (AP)-clustering algorithm along with the Krzanowski & Lai cluster quality index, to select a small yet informative subset of genes. We applied mAP-KL on real microarray data, as well as on simulated data, and compared its performance against 13 other feature selection approaches. Across a variety of diseases and number of samples, mAP-KL presents competitive classification results, particularly in neuromuscular diseases, where its overall AUC score was 0.91. Furthermore, mAP-KL generates concise yet biologically relevant and informative N-gene expression signatures, which can serve as a valuable tool for diagnostic and prognostic purposes, as well as a source of potential disease biomarkers in a broad range of diseases. Conclusions mAP-KL is a data-driven and classifier-independent hybrid feature selection method, which applies to any disease classification problem based on microarray data, regardless of the available samples. Combining multiple hypothesis testing and AP leads to subsets of genes, which classify unknown samples from both, small and large patient cohorts with high accuracy.
SCAF - An Effective Approach to Classify Subspace Clustering Algorithms
Sunita Jahirabadkar,Parag Kulkarni
International Journal of Data Mining & Knowledge Management Process , 2013,
Abstract: Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high dimensional data. Many significant subspace clustering algorithms exist, each having different characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive classification scheme is essential which will consider all such characteristics to divide subspace clustering approaches in various families. The algorithms belonging to same family will satisfy common characteristics. Such a categorization will help future developers to better understand the quality criteria to be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family). Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc. As an illustration, we further provided a comprehensive, systematic description and comparison of few significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.