|
BMC Bioinformatics 2008
Generating samples for association studies based on HapMap dataAbstract: A computer program (gs) was developed to quickly generate a large number of samples based on real data that are useful for a variety of purposes, including evaluating methods for haplotype inference, tag SNP selection and association studies. Two approaches have been implemented to generate dense SNP haplotype/genotype data that share similar local linkage disequilibrium (LD) patterns as those in human populations. The first approach takes haplotype pairs from samples as inputs, and the second approach takes patterns of haplotype block structures as inputs. Both quantitative and qualitative traits have been incorporated in the program. Phenotypes are generated based on a disease model, or based on the effect of a quantitative trait nucleotide, both of which can be specified by users. In addition to single-locus disease models, two-locus disease models have also been implemented that can incorporate any degree of epistasis. Users are allowed to specify all nine parameters in a 3 × 3 penetrance table. For several commonly used two-locus disease models, the program can automatically calculate penetrances based on the population prevalence and marginal effects of a disease that users can conveniently specify.The program gs can effectively generate large scale genetic and phenotypic variation data that can be used for evaluating new developed approaches. It is freely available from the authors' web site at http://www.eecs.case.edu/~jxl175/gs.html webcite.With the completion of the HapMap project [1], large-scale, high-density single-nucleotide polymorphism (SNP) markers and information on haplotype structure and frequencies become available. A variety of statistical approaches have been proposed for association studies using haplotypes [2,3] and more are expected for whole genome association studies. The utilities of such approaches are frequently very difficult to obtain through analytical analysis. Evaluations on those methods commonly rely on experiments based on simu
|