Scaling Up the Accuracy of Decision-Tree Classifiers: A Naive-Bayes Combination
Liangxiao Jiang,Chaoqun Li
Journal of Computers , 2011, DOI: 10.4304/jcp.6.7.1325-1331
Abstract: C4.5 and NB are two of the top 10 algorithms in data mining thanks to their simplicity, effectiveness, and efficiency. In order to integrate their advantages, NBTree builds a naive Bayes classifier on each leaf node of the built decision tree. NBTree significantly outperforms C4.5 and NB in terms of classification accuracy. However, it incurs very high time complexity. In this paper, we propose a very simple, effective, and efficient algorithm based on C4.5 and NB. We simply denote it C4.5-NB. Our motivation is to keep the high classification accuracy of NBTree without incurring the high time complexity. In C4.5-NB, C4.5 and NB are built and evaluated independently at the training time, and the class-membership probabilities are weightily averaged according to their classification accuracies on training data at the test time. Empirical studies on a large number of UCI data sets show that it performs as well as NBTree in terms of classification accuracy, but is significantly more efficient than NBTree.
An Empirical Study on Class Probability Estimates in Decision Tree Learning
Liangxiao Jiang,Chaoqun Li
Journal of Software , 2011, DOI: 10.4304/jsw.6.7.1368-1373
Abstract: Decision tree is one of the most effective and widely used models for classification and ranking and has received a great deal of attention from researchers in the domain of data mining and machine learning. A critical problem in decision tree learning is how to estimate the class-membership probabilities from decision trees. In this paper, we firstly survey all kinds of class probability estimation methods, mainly include the maximum-likelihood estimate, the Laplace estimate, the m-estimate, the similarity-weighted estimate, the naive Bayes-based estimate, and so on. Then, we provide an empirical study on the classification and ranking performance of the resulting decision trees using different class probability estimation methods. The experimental results based on a large number of UCI data sets verify our conclusions.
Rough Set Approach to Multivariate Decision Trees Inducing
Dianhong Wang,Xingwen Liu,Liangxiao Jiang,Xiaoting Zhang
Journal of Computers , 2012, DOI: 10.4304/jcp.7.4.870-879
Abstract: Aimed at the problem of huge computation, large tree size and over-fitting of the testing data for multivariate decision tree (MDT) algorithms, we proposed a novel rough set-based multivariate decision trees (RSMDT) method. In this paper, the positive region degree of condition attributes with respect to decision attributes in rough set theory is used for selecting attributes in multivariate tests. And a new concept of extended generalization of one equivalence relation corresponding to another one is introduced and used for construction of multivariate tests. We experimentally test RSMDT algorithm in terms of classification accuracy, tree size and computing time, using the whole 36 UCI Machine Learning Repository data sets selected by Weka platform, and compare it with C4.5, classification and regression trees (CART), classification and regression trees with linear combinations (CART-LC), Oblique Classifier 1 (OC1), Quick Unbiased Efficient Statistical Trees (QUEST). The experimental results indicate that RSMDT algorithm significantly outperforms the comparison classification algorithms with improved classification accuracy, relatively small tree size, and shorter computing time.
NeSSM: A Next-Generation Sequencing Simulator for Metagenomics
Ben Jia, Liming Xuan, Kaiye Cai, Zhiqiang Hu, Liangxiao Ma, Chaochun Wei
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0075448
Abstract: Background Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools. Results We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics). Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units) version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim. Conclusions NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it’s freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/softwa?re/NeSSM.php.
Comparison of the Water Quality between the Surface Microlayer and Subsurface Water in Typical Water Bodies in Sichuan  [PDF]
Jiang Yu
Journal of Water Resource and Protection (JWARP) , 2010, DOI: 10.4236/jwarp.2010.210101
Abstract: Investigation and assessment of water quality status in the surface microlayer (SML) and subsurface water (SSW) in several kinds of typical water bodies in Sichuan were carried out from May to June 2010.The results showed that N, P were enriched to some extent at SML in Xichi pool, Funan River and Longquan reservoir, which made concentrations of the indexes such as total nitrogen (TN), total phosphorus (TP), chemical oxygen demand (COD)of SML be much higher than those of SSW (P<0.05), and the exceeding rates were up to 100%. The contents of TN,TP,COD of SML and SSW in Xichi pool, and Funan River exceeded III even Ⅳlevel of water quality standard, while these indexes in Longquan reservoir were lower than Ⅲ or Ⅱlevel of water quality standard. Though Chl. a mass concentration at SML and SSW in Funan River was prominently lower than those in Xichi pool and Longquan reservoir, according to the eutrophic evaluation standard, the water bodies of SML and SSW in Funan River and Xichi pool were in middle eutrophication, the highest index of eutrophication (E value) was up to 66.78, while there was light entuophic in Longquan reservoir, and there had obvious difference with E value and COD, TP, TN (P<0.05). This research shows that the water quality of Longquan reservoir is generally well. While Funan River is a middle eutrophication, and its pollution is more serious than Xichi pool, the two waters belong to national III even IV level, and SML has the capability of enrichment to the pollutants such as N, P.
Power Management Integrated Circuit with 90Plus Efficiency Used in AC/DC Converter  [PDF]
Yanfeng JIANG
Energy and Power Engineering (EPE) , 2009, DOI: 10.4236/epe.2009.12016
Abstract: Recently, resonant AC/DC converter has been accepted by the industry. However, the efficiency will be decreased at light load. So, a novel topology with critical controlling mode combined with resonant ones is proposed in this paper. The new topology can correspond to a 90 plus percent of power converting. So,a novel topology of an state of-art integrated circuit, which can be used as power management circuit, has been designed based on the above new topology. A simulator which is specific suitable for the power controller has been founded in this work and it has been used for the simulation of the novel architecture and the proposed integrated circuit.
Limit Cycle Bifurcations in a Class of Cubic System near a Nilpotent Center  [PDF]
Jiao Jiang
Applied Mathematics (AM) , 2012, DOI: 10.4236/am.2012.37115
Abstract: In this paper we deal with a cubic near-Hamiltonian system whose unperturbed system is a simple cubic Hamiltonian system having a nilpotent center. We prove that the system can have 5 limit cycles by using bifurcation theory.
China’s Forensic System: Critical Comments on the “Latest” Flaw  [PDF]
Jiang Na
Chinese Studies (ChnStd) , 2014, DOI: 10.4236/chnstd.2014.33013
Abstract: The article will critically examine the fundamental flaws that have been newly discovered from the “latest” case studies. In recent decades, numerous miscarriages of justice have occurred in China mainly due to the insufficient or improper use of forensic evidence. Comments on the “latest” flaw will start from an overview of the notorious wrongful conviction in Case ZHANG Gaoping and ZHANG Hui whose exonerations in 2013 were based on the proper use of forensic techniques such as DNA testing. The case highlights the injustice that results when forensic evidence is ignored in favour of wrongful confessions extorted under police torture. It has been suggested that China’s several waves of forensic science reform cannot lead the current forensic identification to objective, fair or reliable forensic evidence. The “latest” founded flaw entrenched in its forensic system failed to be solved by technical, financial, administrative or legal progress only. In essence, the 2005 reform on forensic identification is flawed to its core, albeit being recently identified. This is primarily because in law forensic experts inside police can conduct identification to provide forensic evidence on cases investigated by police, which cannot ensure necessary check or balance to prevent or reduce forensic errors in practice.
Impacts of China’s Strike Hard Policy on Forensic Evidence  [PDF]
Jiang Na
Chinese Studies (ChnStd) , 2014, DOI: 10.4236/chnstd.2014.32010
Abstract: The media has reported numerous miscarriages of justice in China, some of which directly result from errors in forensic evidence as a main cause. Given that such miscarriages occurred under the influence of China’s Strike Hard Policy, empirical studies on its impact on forensic evidence, par-ticularly that leading to miscarriages of justice will be conducted at multiple levels with diverse research methods. The old policy officially took effect mainly from 1983 to 2005, when problems in forensic evidence significantly produced more miscarriages of justice. The old policy’s impact on forensic evidence will be further explored based on data that were collected from experiments conducted with 394 questionnaires and 100 judges in four sample cities, just before and after the old policy was replaced with a balanced policy in late 2005. Surveys to elicit the traits of forensic identification were used, as well as the exogenous imposition of the old policy to identify its negative impacts on forensic evidence, combined with new policy effects. The 2005 reform towards balancing leniency and severity is essentially inadequate to prevent errors in forensic evidence.
How Does Growth Follow Differential Convergence Patterns? A Study of the Chinese Regions and Sectors  [PDF]
Yanqing Jiang
Theoretical Economics Letters (TEL) , 2014, DOI: 10.4236/tel.2014.48090
Abstract: This paper focuses on examining differential convergence patterns in productivity growth in China. Our empirical analysis shows that the Chinese provinces exhibit absolute divergence and then absolute convergence respectively during 1990-2000 and 2000-2010. In addition, absolute convergence is present during 1985-1995 and 2000-2010 in the primary sector and during 1995-2010 in the secondary sector. Our regressions also show that either for the overall regional economy, or for any individual sector, growth in labor productivity exhibits strong convergence. Besides the convergence trends, we also find that the secondary and tertiary sectors have grown significantly faster than the primary sector.
