Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99


Any time

2020 ( 26 )

2019 ( 267 )

2018 ( 1485 )

2017 ( 1337 )

Custom range...

Search Results: 1 - 10 of 80912 matches for " Zhenqiu Liu "
All listed articles are free for downloading (OA Articles)
Page 1 /80912
Display every page Item
Classification Using Mass Spectrometry Proteomic Data with Kernel-Based Algorithms
Zhenqiu Liu,Shili Lin
Engineering Letters , 2006,
Efficient Regularized Regression for Variable Selection with L0 Penalty
Zhenqiu Liu,Gang Li
Computer Science , 2014,
Abstract: Variable (feature, gene, model, which we use interchangeably) selections for regression with high-dimensional BIGDATA have found many applications in bioinformatics, computational biology, image processing, and engineering. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. L0 is known as the most essential sparsity measure and has nice theoretical properties, while the popular L1 regularization is only a best convex relaxation of L0. Therefore, it is natural to expect that L0 regularized regression performs better than LASSO. However, it is well-known that L0 optimization is NP-hard and computationally challenging. Instead of solving the L0 problems directly, most publications so far have tried to solve an approximation problem that closely resembles L0 regularization. In this paper, we propose an efficient EM algorithm (L0EM) that directly solves the L0 optimization problem. $L_0$EM is efficient with high dimensional data. It also provides a natural solution to all Lp p in [0,2] problems. The regularized parameter can be either determined through cross-validation or AIC and BIC. Theoretical properties of the L0-regularized estimator are given under mild conditions that permit the number of variables to be much larger than the sample size. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than LASSO and L0 with AIC or BIC has similar performance as computationally intensive cross-validation. The proposed algorithms are efficient in identifying the non-zero variables with less-bias and selecting biologically important genes and pathways with high dimensional BIGDATA.
Regularized F-Measure Maximization for Feature Selection and Classification
Zhenqiu Liu,Ming Tan,Feng Jiang
Journal of Biomedicine and Biotechnology , 2009, DOI: 10.1155/2009/617946
Abstract: Receiver Operating Characteristic (ROC) analysis is a common tool for assessing the performance of various classifications. It gained much popularity in medical and other fields including biological markers and, diagnostic test. This is particularly due to the fact that in real-world problems misclassification costs are not known, and thus, ROC curve and related utility functions such as F-measure can be more meaningful performance measures. F-measure combines recall and precision into a global measure. In this paper, we propose a novel method through regularized F-measure maximization. The proposed method assigns different costs to positive and negative samples and does simultaneous feature selection and prediction with 1 penalty. This method is useful especially when data set is highly unbalanced, or the labels for negative (positive) samples are missing. Our experiments with the benchmark, methylation, and high dimensional microarray data show that the performance of proposed algorithm is better or equivalent compared with the other popular classifiers in limited experiments.
Gene Expression Data Classification With Kernel Principal Component Analysis
Zhenqiu Liu,Dechang Chen,Halima Bensmail
Journal of Biomedicine and Biotechnology , 2005, DOI: 10.1155/jbb.2005.155
Abstract: One important feature of the gene expression data is that the number of genes M far exceeds the number of samples N. Standard statistical methods do not work well when N
Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
Zhenqiu Liu, Halima Bensmail and Ming Tan
Evolutionary Bioinformatics , 2012, DOI: 10.4137/EBO.S9407
Abstract: Multiclass classification and feature (variable) selections are commonly encountered in many biological and medical applications. However, extending binary classification approaches to multiclass problems is not trivial. Instance-based methods such as the K nearest neighbor (KNN) can naturally extend to multiclass problems and usually perform well with unbalanced data, but suffer from the curse of dimensionality. Their performance is degraded when applied to high dimensional data. On the other hand, model-based methods such as logistic regression require the decomposition of the multiclass problem into several binary problems with one-vs.-one or one-vs.-rest schemes. Even though they can be applied to high dimensional data with L1 or Lp penalized methods, such approaches can only select independent features and the features selected with different binary problems are usually different. They also produce unbalanced classification problems with one vs. the rest scheme even if the original multiclass problem is balanced. By combining instance-based and model-based learning, we propose an efficient learning method with integrated KNN and constrained logistic regression (KNNLog) for simultaneous multiclass classification and feature selection. Our proposed method simultaneously minimizes the intra-class distance and maximizes the interclass distance with fewer estimated parameters. It is very efficient for problems with small sample size and unbalanced classes, a case common in many real applications. In addition, our model-based feature selection methods can identify highly correlated features simultaneously avoiding the multiplicity problem due to multiple tests. The proposed method is evaluated with simulation and real data including one unbalanced microRNA dataset for leukemia and one multi-class metagenomic dataset from the Human Microbiome Project (HMP). It performs well with limited computational experiments.
Gene Expression Data Classification With Kernel Principal Component Analysis
Liu Zhenqiu,Chen Dechang,Bensmail Halima
Journal of Biomedicine and Biotechnology , 2005,
Abstract: One important feature of the gene expression data is that the number of genes M far exceeds the number of samples N . Standard statistical methods do not work well when N < M . Development of new methodologies or modification of existing methodologies is needed for the analysis of the microarray data. In this paper, we propose a novel analysis procedure for classifying the gene expression data. This procedure involves dimension reduction using kernel principal component analysis (KPCA) and classification with logistic regression (discrimination). KPCA is a generalization and nonlinear version of principal component analysis. The proposed algorithm was applied to five different gene expression datasets involving human tumor samples. Comparison with other popular classification methods such as support vector machines and neural networks shows that our algorithm is very promising in classifying gene expression data.
Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
Zhenqiu Liu,Halima Bensmail,Ming Tan
Evolutionary Bioinformatics , 2012,
Constructing Tumor Progression Pathways and Biomarker Discovery with Fuzzy Kernel Kmeans and DNA Methylation Data
Zhenqiu Liu,Zhongmin Guo,Ming Tan
Cancer Informatics , 2008,
Abstract: Constructing pathways of tumor progression and discovering the biomarkers associated with cancer is critical for understanding the molecular basis of the disease and for the establishment of novel chemotherapeutic approaches and in turn improving the clinical efficiency of the drugs. It has recently received a lot of attention from bioinformatics researchers. However, relatively few methods are available for constructing pathways. This article develops a novel entropy kernel based kernel clustering and fuzzy kernel clustering algorithms to construct the tumor progression pathways using CpG island methylation data. The methylation data which come from tumor tissues diagnosed at different stages can be used to distinguish epigenotype and phenotypes the describe the molecular events of different phases. Using kernel and fuzzy kernel kmeans, we built tumor progression trees to describe the pathways of tumor progression and find the possible biomarkers associated with cancer. Our results indicate that the proposed algorithms together with methylation profiles can predict the tumor progression stages and discover the biomarkers efficiently. Software is available upon request.
Class Prediction and Feature Selection with Linear Optimization for Metagenomic Count Data
Zhenqiu Liu, Dechang Chen, Li Sheng, Amy Y. Liu
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0053253
Abstract: The amount of metagenomic data is growing rapidly while the computational methods for metagenome analysis are still in their infancy. It is important to develop novel statistical learning tools for the prediction of associations between bacterial communities and disease phenotypes and for the detection of differentially abundant features. In this study, we presented a novel statistical learning method for simultaneous association prediction and feature selection with metagenomic samples from two or multiple treatment populations on the basis of count data. We developed a linear programming based support vector machine with and joint penalties for binary and multiclass classifications with metagenomic count data (metalinprog). We evaluated the performance of our method on several real and simulation datasets. The proposed method can simultaneously identify features and predict classes with the metagenomic count data.
Correction: Class Prediction and Feature Selection with Linear Optimization for Metagenomic Count Data
Zhenqiu Liu, Dechang Chen, Li Sheng, Amy Y. Liu
PLOS ONE , 2014, DOI: 10.1371/journal.pone.0097958
Page 1 /80912
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.