Inferring population structure using Bayesian clustering programs often requires a priori specification of the number of subpopulations, , from which the sample has been drawn. Here, we explore the utility of a common Bayesian model selection criterion, the Deviance Information Criterion (DIC), for estimating . We evaluate the accuracy of DIC, as well as other popular approaches, on datasets generated by coalescent simulations under various demographic scenarios. We find that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure.
References
[1]
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945–59.
[2]
Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567–87.
[3]
Corander J, Waldmann P, Sillanpaa M (2003) Bayesian analysis of genetic differentiation between populations. Genetics 163: 367–74.
[4]
Francois O, Ancelet S, Guillot G (2006) Bayesian clustering using hidden markov random fields in spatial population genetics. Genetics 174: 805–16.
[5]
Gao H, Williamson S, Bustamante CD (2007) An mcmc approach for joint inference of population structure and inbreeding rates from multi-locus genotype data. Genetics 176: 1635–51.
[6]
Corander J, Waldmann P, Marttinen P, Sillanp?? M (2004) Baps 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics 20: 2363–2369.
[7]
Corander J, Marttinen P (2006) Bayesian identification of admixture events using multi-locus molecular markers. Molecular Ecology 15: 2833–2843.
[8]
Corander J, Tang J (2007) Bayesian analysis of population structure based on linked molecular information. Mathematical Biosciences 205: 19–31.
[9]
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Molecular Ecology 14: 2611–2620.
[10]
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. Journal of Royal Statistical Society, Series B 64: 538–640.
[11]
Patterson NJ, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2: e190.
[12]
Huelsenbeck JP, Andolfatto P (2007) Inference of population structure under a dirichlet process model. Genetics 175: 1787–802.
[13]
Pella J, Masuda M (2006) The gibbs and split-merger sampler for population mixture analysis from genetic data with incomplete baselines. Can J Fish AquatSci 63: 576–596.
[14]
Corander J, Gyllenberg M, Koski T (2006) Bayesian model learning based on a parallel mcmc strategy. Stat Comput 16: 355–362.
[15]
Rosenberg N, Pritchard JK, Weber JL, Cann H, Kidd K, et al. (2002) Genetic structure of human populations. Science 298: 2381–5.
[16]
Celeux G, Forbes F, Robert CP, Titterington DM (2005) Deviance information criteria for missing data models. Bayesian Analysis.
[17]
Hudson RR (2002) Generating samples under a wright-fisher neutral model of genetic variation. Bioinformatics 18: 337–8.