All Title Author
Keywords Abstract

PLOS ONE  2012 

Relating the Disease Mutation Spectrum to the Evolution of the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR)

DOI: 10.1371/journal.pone.0042336

Full-Text   Cite this paper   Add to My Lib


Cystic fibrosis (CF) is the most common genetic disease among Caucasians, and accordingly the cystic fibrosis transmembrane conductance regulator (CFTR) protein has perhaps the best characterized disease mutation spectrum with more than 1,500 causative mutations having been identified. In this study, we took advantage of that wealth of mutational information in an effort to relate site-specific evolutionary parameters with the propensity and severity of CFTR disease-causing mutations. To do this, we devised a scoring scheme for known CFTR disease-causing mutations based on the Grantham amino acid chemical difference matrix. CFTR site-specific evolutionary constraint values were then computed for seven different evolutionary metrics across a range of increasing evolutionary depths. The CFTR mutational scores and the various site-specific evolutionary constraint values were compared in order to evaluate which evolutionary measures best reflect the disease-causing mutation spectrum. Site-specific evolutionary constraint values from the widely used comparative method PolyPhen2 show the best correlation with the CFTR mutation score spectrum, whereas more straightforward conservation based measures (ConSurf and ScoreCons) show the greatest ability to predict individual CFTR disease-causing mutations. While far greater than could be expected by chance alone, the fraction of the variability in mutation scores explained by the PolyPhen2 metric (3.6%), along with the best set of paired sensitivity (58%) and specificity (60%) values for the prediction of disease-causing residues, were marginal. These data indicate that evolutionary constraint levels are informative but far from determinant with respect to disease-causing mutations in CFTR. Nevertheless, this work shows that, when combined with additional lines of evidence, information on site-specific evolutionary conservation can and should be used to guide site-directed mutagenesis experiments by more narrowly defining the set of target residues, resulting in a potential savings of both time and money.


[1]  Riordan JR, Rommens JM, Kerem B, Alon N, Rozmahel R, et al. (1989) Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245: 1066–1073.
[2]  Zieve D, Hadjiliadis D (2011) Cystic Fibrosis. Available: 2012 Mar 30..
[3]  Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249.
[4]  Gaucher EA, De Kee DW, Benner SA (2006) Application of DETECTER, an evolutionary genomic tool to analyze genetic variation, to the cystic fibrosis gene family. BMC Genomics 7: 44.
[5]  Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31: 3812–3814.
[6]  Ramensky V, Bork P, Sunyaev S (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res 30: 3894–3900.
[7]  Sunyaev S, Ramensky V, Koch I, Lathe W 3rd, Kondrashov AS, et al (2001) Prediction of deleterious human alleles. Hum Mol Genet 10: 591–597.
[8]  Kimura M, Ohta T (1974) On some principles governing molecular evolution. Proc Natl Acad Sci U S A 71: 2848–2852.
[9]  Greenblatt MS, Beaudet JG, Gump JR, Godin KS, Trombley L, et al. (2003) Detailed computational study of p53 and p16: using evolutionary sequence analysis and disease-associated mutations to predict the functional consequences of allelic variants. Oncogene 22: 1150–1163.
[10]  Miller MP, Kumar S (2001) Understanding human disease mutations through the use of interspecific genetic variation. Hum Mol Genet 10: 2319–2328.
[11]  Mooney SD, Klein TE (2002) The functional importance of disease-associated mutation. BMC Bioinformatics 3: 24.
[12]  Sunyaev S, Ramensky V, Bork P (2000) Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet 16: 198–200.
[13]  Smith NG, Eyre-Walker A (2003) Human disease genes: patterns and predictions. Gene 318: 169–175.
[14]  Notaro R, Afolayan A, Luzzatto L (2000) Human mutations in glucose 6-phosphate dehydrogenase reflect evolutionary history. FASEB J 14: 485–494.
[15]  Kondrashov AS, Sunyaev S, Kondrashov FA (2002) Dobzhansky-Muller incompatibilities in protein evolution. Proc Natl Acad Sci U S A 99: 14878–14883.
[16]  Schaner P, Richards N, Wadhwa A, Aksentijevich I, Kastner D, et al. (2001) Episodic evolution of pyrin in primates: human mutations recapitulate ancestral amino acid states. Nat Genet 27: 318–321.
[17]  Miller MP, Parker JD, Rissing SW, Kumar S (2003) Quantifying the intragenic distribution of human disease mutations. Ann Hum Genet 67: 567–579.
[18]  Dulhanty AM, Riordan JR (1994) Phosphorylation by cAMP-dependent protein kinase causes a conformational change in the R domain of the cystic fibrosis transmembrane conductance regulator. Biochemistry 33: 4072–4079.
[19]  Ostedgaard LS, Baldursson O, Vermeer DW, Welsh MJ, Robertson AD (2000) A functional R domain from cystic fibrosis transmembrane conductance regulator is predominantly unstructured in solution. Proc Natl Acad Sci U S A 97: 5657–5662.
[20]  Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185: 862–864.
[21]  Krawczak M, Ball EV, Cooper DN (1998) Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet 63: 474–488.
[22]  Rose PW, Beran B, Bi C, Bluhm WF, Dimitropoulos D, et al. (2011) The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res 39: D392–401.
[23]  Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, et al. (2012) The Pfam protein families database. Nucleic Acids Res 40: D290–301.
[24]  Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics Chapter 2: Unit 2 3.
[25]  Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15: 330–340.
[26]  Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797.
[27]  Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5: 113.
[28]  Notredame C, Higgins DG, Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302: 205–217.
[29]  Pruitt KD, Tatusova T, Klimke W, Maglott DR (2009) NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res 37: D32–36.
[30]  Valdar WS, Thornton JM (2001) Protein-protein interfaces: analysis of amino acid conservation in homodimers. Proteins 42: 108–124.
[31]  Valdar WS (2002) Scoring residue conservation. Proteins 48: 227–241.
[32]  Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N (2010) ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res 38: W529–533.
[33]  Berezin C, Glaser F, Rosenberg J, Paz I, Pupko T, et al. (2004) ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20: 1322–1324.
[34]  Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, et al. (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15: 1034–1050.
[35]  Gu X, Vander Velden K (2002) DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 18: 500–501.
[36]  Doron-Faigenboim A, Stern A, Mayrose I, Bacharach E, Pupko T (2005) Selecton: a server for detecting evolutionary forces at a single amino-acid site. Bioinformatics 21: 2101–2103.


comments powered by Disqus