All Title Author
Keywords Abstract

PLOS ONE  2011 

Bayesian Classification and Regression Trees for Predicting Incidence of Cryptosporidiosis

DOI: 10.1371/journal.pone.0023903

Full-Text   Cite this paper   Add to My Lib


Background Classification and regression tree (CART) models are tree-based exploratory data analysis methods which have been shown to be very useful in identifying and estimating complex hierarchical relationships in ecological and medical contexts. In this paper, a Bayesian CART model is described and applied to the problem of modelling the cryptosporidiosis infection in Queensland, Australia. Methodology/Principal Findings We compared the results of a Bayesian CART model with those obtained using a Bayesian spatial conditional autoregressive (CAR) model. Overall, the analyses indicated that the nature and magnitude of the effect estimates were similar for the two methods in this study, but the CART model more easily accommodated higher order interaction effects. Conclusions/Significance A Bayesian CART model for identification and estimation of the spatial distribution of disease risk is useful in monitoring and assessment of infectious diseases prevention and control.


[1]  Meinhardt P, Casemore D, Miller K (1996) Epidemiologic aspects of human cryptosporidiosis and the role of waterborne transmission. Epidemiol Rev 18: 118–136.
[2]  Mabaso M, Vounatsou P, Midzi S, Silva J, Smith T (2006) Spatio-temporal analysis of the role of climate in inter-annual variation of malaria incidence in Zimbabwe. Int J Health Geog 5: 20.
[3]  Moore D, Carpenter T (1999) Spatial analytical methods and geographic information systems: use in health research and epidemiology. Epidemiol Rev 21: 143–161.
[4]  Anselin L (2002) Under the hood - Issues in the specification and interpretation of spatial regression models. Agric Econo 27: 247–267.
[5]  Anselin L (2005) Exploring spatial data with GeoDa: a workbook. Urbana, USA..
[6]  Duc H, Jalaludin B, Morgan G (2009) Associations between Air Pollution and Hospital Visits for Cardiovascular Diseases in the Elderly in Sydney Using Bayesian Statistical Methods. Aust N Z J Stat 51: 289–303.
[7]  Hoeting J, Raftery AE, Madigan D (1996) A method for simultaneous variable selection and outlier identification in linear regression. Comput Stat Data An 22: 251–270.
[8]  Lamon EC 3rd, Stow CA (2004) Bayesian methods for regional-scale eutrophication models. Water Res 38: 2764–2774.
[9]  Lawson A, Browne W, Vidal Rodeiro C (2003) Disease mapping with WinBUGS and MLwiN. England: John Wiley & Sons Ltd.
[10]  Escaramis G, Carrasco J, Ascaso C (2007) Detection of significant disease risks using a spatial conditional autoregressive model. Biometrics 64: 1043–1053.
[11]  Beale CM, Lennon JJ, Yearsley JM, Brewer MJ, Elston DA (2010) Regression analysis of spatial data. Ecol Lett 13: 246–264.
[12]  Yang G, Vounatsou P, Zhou X, Tanner M, Utzinger J (2005) A Bayesian-based approach for spatio-temporal modeling of county level prevalence of Schistosoma japonicum infection in Jiangsu province, China. Int J Parasitol 35: 155–162.
[13]  Clements A, Lwambo N, Blair L, Nyandindi U, Kaatano G, et al. (2006) Bayesian spatial analysis and disease mapping: tools to enhance planning and implementation of a schistosomiasis control programme in Tanzania. Trop Med Int Health 11: 490–503.
[14]  Hu W, Clements A, Williams G, Tong S, Mengersen K (2010) Bayesian spatiotemporal analysis of socio-ecologic drivers of Ross River virus transmission in Queensland, Australia. Am J Trop Med Hyg 83: 722–728.
[15]  Breiman L, Fredman J, Olshen R, Stone C (1984) Classification and regression trees. New York: Chapman & Hall (Wardworth, Inc).
[16]  De'ath G, Fabricius K (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81: 3178–3192.
[17]  Hu W, Mengersen K, Dale P, Tong S (2010) Difference in mosquito species (Diptera: Culicidae) and the transmission of Ross River virus between coastline and inland areas in Brisbane, Australia. Environ Entomol 39: 88–97.
[18]  Hu W, Tong S, Mengersen K, Oldenburg B, Dale P (2006) Mosquito species (Diptera: Culicidae) and the transmission of Ross River virus in Brisbane, Australia. J Med Entomol 43: 375–381.
[19]  Chipman HA, George EI, McCulloch RE (1998) Bayesian CART model search. J Am Stat Assoc 93: 935–948.
[20]  Denison DGT, Mallick BK, Smith AFM (1998) A Bayesian CART algorithm. Biometrika 85: 363–377.
[21]  O'Leary R, Francis R, K C, Firth M, Kees U, et al. (2009) A comparison of Bayesian classification trees and random forest to identify classifiers for childhood leukaemia. 18th World IMACS/MODSIM Congress. Cairns, Australia.
[22]  O'Leary R (2008) Informed statistical modelling of habitat suitability for rare and threatened species [PhD Thesis]. Brisbane: Queensland University of Technology.
[23]  O'Leary R, Murray J, Low Choy S, Mengersen K (2008) Expert elicitation for Bayesian classification trees. J Appl Probab Stat 3: 95–106.
[24]  Hu W, Mengersen K, Tong S (2010) Risk factor analysis and spatiotemporal CART model of cryptosporidiosis in Queensland, Australia. BMC Infect Dis 10: 311.
[25]  Cordell H (2009) Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 10: 392–404.
[26]  Green P (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732.
[27]  Chipman HA, George EI, McCulloch RE (2010) Bart: Bayesian Additive Regression Trees. Annals of Applied Statistics 4: 266–298.
[28]  Gelman A, Carlin J, Stern H, Rubin D (2004) Bayesian data analysis (2nd ed). Florida: Chapman & Hall/CRC.
[29]  Cameron A, Trivedi P (1998) Regression Analysis of Count Data. Cambridge: Cambridge University Press.
[30]  WinBUGs (2008) MRC Biostatistics Unit. Imperial College London, Cambridge, UK.
[31]  Therneau T, Atkinson E (1997) An Introduction to Recursive Partitioning Using the rpart Routine. Rochester.
[32]  Therneau T, Atkinson E (2003) The rpart package. Software manual.


comments powered by Disqus