Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Predicting smear negative pulmonary tuberculosis with classification trees and logistic regression: a cross-sectional study
Fernanda Mello, Luiz Bastos, Sérgio Soares, Valéria MC Rezende, Marcus Conde, Richard E Chaisson, Afranio Kritski, Antonio Ruffino-Netto, Guilherme Werneck
BMC Public Health , 2006, DOI: 10.1186/1471-2458-6-43
Abstract: The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples.It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%.The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.Tuberculosis is one of the most important health problems in the world, with more than 8 million new cases and almost 2 million deaths each year [1,2]. The detection and management of pulmonary tuberculosis (PT) is a principal aim of tuberculosis control programs. However, smear-negative pulmonary tuberculosis (SNPT) is an increasing clinical and epidemiological problem, particularly in areas that are affected by the dual tuberculosis/human immunodeficiency virus infection (TB/HIV) [3]. A recent DNA fingerprinting study from San Francisco attributed 17% of TB transmission in this low prevalence setting to patients with SNPT [4]. HIV infection has been associated with an increased incidence of SNPT [5] and a higher mortality rate among patients with SNPT [6]. In Brazil, almost 30% of PT cases among adults are SNPT [7,8].Diagnosis of SNPT is a difficult task, and in developing countries, the m
Risk factor analysis and spatiotemporal CART model of cryptosporidiosis in Queensland, Australia
Wenbiao Hu, Kerrie Mengersen, Shilu Tong
BMC Infectious Diseases , 2010, DOI: 10.1186/1471-2334-10-311
Abstract: Data on weather variables, notified cryptosporidiosis cases and social economic factors in Queensland were supplied by the Australian Bureau of Meteorology, Queensland Department of Health, and Australian Bureau of Statistics, respectively. Three-stage spatiotemporal classification and regression tree (CART) models were developed to examine the association between social economic and weather factors and monthly incidence of cryptosporidiosis in Queensland, Australia. The spatiotemporal CART model was used for predicting the outbreak of cryptosporidiosis in Queensland, Australia.The results of the classification tree model (with incidence rates defined as binary presence/absence) showed that there was an 87% chance of an occurrence of cryptosporidiosis in a local government area (LGA) if the socio-economic index for the area (SEIFA) exceeded 1021, while the results of regression tree model (based on non-zero incidence rates) show when SEIFA was between 892 and 945, and temperature exceeded 32°C, the relative risk (RR) of cryptosporidiosis was 3.9 (mean morbidity: 390.6/100,000, standard deviation (SD): 310.5), compared to monthly average incidence of cryptosporidiosis. When SEIFA was less than 892 the RR of cryptosporidiosis was 4.3 (mean morbidity: 426.8/100,000, SD: 319.2). A prediction map for the cryptosporidiosis outbreak was made according to the outputs of spatiotemporal CART models.The results of this study suggest that spatiotemporal CART models based on social economic and weather variables can be used for predicting the outbreak of cryptosporidiosis in Queensland, Australia.Cryptosporidiosis is a diarrhoeal disease caused by microscopic parasites of the Cryptosporidium parvum [1]. The parasite is one of the most common causes of waterborne disease in Australia and globally and is found in drinking water and recreational water [2]. Cryptosporidiosis can also be transmitted via contaminated food, contact between people, or contact between people and animals.
BART: Bayesian additive regression trees  [PDF]
Hugh A. Chipman,Edward I. George,Robert E. McCulloch
Statistics , 2008, DOI: 10.1214/09-AOAS285
Abstract: We develop a Bayesian "sum-of-trees" model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BART's many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.
Parallel Bayesian Additive Regression Trees  [PDF]
Matthew T. Pratola,Hugh A. Chipman,James R. Gattiker,David M. Higdon,Robert McCulloch,William N. Rust
Statistics , 2013,
Abstract: Bayesian Additive Regression Trees (BART) is a Bayesian approach to flexible non-linear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov Chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this paper we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.
bartMachine: Machine Learning with Bayesian Additive Regression Trees  [PDF]
Adam Kapelner,Justin Bleich
Computer Science , 2013,
Abstract: We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and capable of handling both large sample sizes and high-dimensional data.
Particle Gibbs for Bayesian Additive Regression Trees  [PDF]
Balaji Lakshminarayanan,Daniel M. Roy,Yee Whye Teh
Computer Science , 2015,
Abstract: Additive regression trees are flexible non-parametric models and popular off-the-shelf tools for real-world non-linear regression. In application domains, such as bioinformatics, where there is also demand for probabilistic predictions with measures of uncertainty, the Bayesian additive regression trees (BART) model, introduced by Chipman et al. (2010), is increasingly popular. As data sets have grown in size, however, the standard Metropolis-Hastings algorithms used to perform inference in BART are proving inadequate. In particular, these Markov chains make local changes to the trees and suffer from slow mixing when the data are high-dimensional or the best fitting trees are more than a few layers deep. We present a novel sampler for BART based on the Particle Gibbs (PG) algorithm (Andrieu et al., 2010) and a top-down particle filtering algorithm for Bayesian decision trees (Lakshminarayanan et al., 2013). Rather than making local changes to individual trees, the PG sampler proposes a complete tree to fit the residual. Experiments show that the PG sampler outperforms existing samplers in many settings.
Bayesian Additive Regression Trees With Parametric Models of Heteroskedasticity  [PDF]
Justin Bleich,Adam Kapelner
Statistics , 2014,
Abstract: We incorporate heteroskedasticity into Bayesian Additive Regression Trees (BART) by modeling the log of the error variance parameter as a linear function of prespecified covariates. Under this scheme, the Gibbs sampling procedure for the original sum-of- trees model is easily modified, and the parameters for the variance model are updated via a Metropolis-Hastings step. We demonstrate the promise of our approach by providing more appropriate posterior predictive intervals than homoskedastic BART in heteroskedastic settings and demonstrating the model's resistance to overfitting. Our implementation will be offered in an upcoming release of the R package bartMachine.
Influencing elections with statistics: Targeting voters with logistic regression trees  [PDF]
Thomas Rusch,Ilro Lee,Kurt Hornik,Wolfgang Jank,Achim Zeileis
Statistics , 2013, DOI: 10.1214/13-AOAS648
Abstract: In political campaigning substantial resources are spent on voter mobilization, that is, on identifying and influencing as many people as possible to vote. Campaigns use statistical tools for deciding whom to target ("microtargeting"). In this paper we describe a nonpartisan campaign that aims at increasing overall turnout using the example of the 2004 US presidential election. Based on a real data set of 19,634 eligible voters from Ohio, we introduce a modern statistical framework well suited for carrying out the main tasks of voter targeting in a single sweep: predicting an individual's turnout (or support) likelihood for a particular cause, party or candidate as well as data-driven voter segmentation. Our framework, which we refer to as LORET (for LOgistic REgression Trees), contains standard methods such as logistic regression and classification trees as special cases and allows for a synthesis of both techniques. For our case study, we explore various LORET models with different regressors in the logistic model components and different partitioning variables in the tree components; we analyze them in terms of their predictive accuracy and compare the effect of using the full set of available variables against using only a limited amount of information. We find that augmenting a standard set of variables (such as age and voting history) with additional predictor variables (such as the household composition in terms of party affiliation) clearly improves predictive accuracy. We also find that LORET models based on tree induction beat the unpartitioned models. Furthermore, we illustrate how voter segmentation arises from our framework and discuss the resulting profiles from a targeting point of view.
Bayesian regression and Bitcoin  [PDF]
Devavrat Shah,Kang Zhang
Computer Science , 2014,
Abstract: In this paper, we discuss the method of Bayesian regression and its efficacy for predicting price variation of Bitcoin, a recently popularized virtual, cryptographic currency. Bayesian regression refers to utilizing empirical data as proxy to perform Bayesian inference. We utilize Bayesian regression for the so-called "latent source model". The Bayesian regression for "latent source model" was introduced and discussed by Chen, Nikolov and Shah (2013) and Bresler, Chen and Shah (2014) for the purpose of binary classification. They established theoretical as well as empirical efficacy of the method for the setting of binary classification. In this paper, instead we utilize it for predicting real-valued quantity, the price of Bitcoin. Based on this price prediction method, we devise a simple strategy for trading Bitcoin. The strategy is able to nearly double the investment in less than 60 day period when run against real data trace.
DART: Dropouts meet Multiple Additive Regression Trees  [PDF]
K. V. Rashmi,Ran Gilad-Bachrach
Computer Science , 2015,
Abstract: Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaining instances. This negatively affects the performance of the model on unseen data, and also makes the model over-sensitive to the contributions of the few, initially added tress. We show that the commonly used tool to address this issue, that of shrinkage, alleviates the problem only to a certain extent and the fundamental issue of over-specialization still remains. In this work, we explore a different approach to address the problem that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks. We propose a novel way of employing dropouts in MART, resulting in the DART algorithm. We evaluate DART on ranking, regression and classification tasks, using large scale, publicly available datasets, and show that DART outperforms MART in each of the tasks, with a significant margin. We also show that DART overcomes the issue of over-specialization to a considerable extent.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.