|
BMC Bioinformatics 2006
Predicting deleterious nsSNPs: an analysis of sequence and structural attributesAbstract: The measure of prediction success is greatly affected by the level of imbalance in the training dataset. We found the balanced dataset that included all attributes produced the best prediction. The performance as measured by the Matthews correlation coefficient (MCC) varied between 0.49 and 0.25 depending on the imbalance. As previously observed, the degree of sequence conservation at the nsSNP position is the single most useful attribute. In addition to conservation, structural predictions made using a balanced dataset can be of value.The predictions for all nsSNPs within Ensembl, based on a balanced dataset using all attributes, are available as a DAS annotation. Instructions for adding the track to Ensembl are at http://www.brightstudy.ac.uk/das_help.html webciteSingle base changes in protein coding regions of DNA which lead to changes in amino acids have the potential to effect protein structure and function. These non-synonymous single nucleotide polymorphisms (nsSNPs) have been the subject of many recent studies and a large amount of data now exists in public repositories such as dbSNP [1], HGVBASE [2] and SWISSPROT [3]. Some nsSNPs are related to a disease condition but others are not associated with any change in phenotype and are regarded as neutral. Several studies have attempted to predict the functional consequences of a nsSNP, namely whether it is disease related or neutral, based on attributes of the polymorphism. Some attributes depend only on the sequence information, for example the types of residue found at the SNP location. Structural attributes such as solvent accessibility can be chosen if the protein sequence containing the nsSNP has a known 3D structure or is highly similar to a protein sequence of known structure. As structural genomics projects gain momentum an increasingly large amount of protein 3D structural information is becoming available. Mapping nsSNPs onto the corresponding 3D structures or onto the structures of proteins which are
|