%0 Journal Article
%T Iterative pruning PCA improves resolution of highly structured populations
%A Apichart Intarapanich
%A Philip J Shaw
%A Anunchai Assawamakin
%A Pongsakorn Wangkumhang
%A Chumpol Ngamphiw
%A Kridsadakorn Chaichoompu
%A Jittima Piriyapongsa
%A Sissades Tongsima
%J BMC Bioinformatics
%D 2009
%I BioMed Central
%R 10.1186/1471-2105-10-382
%X A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods.The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.Allele frequencies vary across populations because of differences in ancestry; these differences arise from many factors such as migration, selection and drift. Hence, populations are genetically substructured. The information obtained from resolving population substructure can be used to infer population history. Furthermore, human disease association studies must account and correct for the population substructure to reduce spurious associations and reveal the predisposing factors of disease [1]. Analysis of population stratification must meet four main challenges namely: (i) detecting structure, (ii) assigning individuals to subpopulations, (iii) determining the number of optimal, or primal, subpopulations (K) and (iv) determining the proportions of ancestral subpopulations (admixture) [2].With the advent of high throughput genotyping, increasingly large genotypic datasets (e.g. HapMap dataset of 3.5 million single nucleotide polymorphism (SNP) arrays from 270 individuals[3]) will provide prog
%U http://www.biomedcentral.com/1471-2105/10/382