|
Estimating the evidence of selection and the reliability of inference in unigenic evolutionAbstract: We develop a novel approach, termed the Evidence of Selection (EoS), removing the assumption that functionally important sites are adjacent in sequence and and explicitly modelling the effects of limited sample-size. Precise statistical derivations show that the EoS score can be easily interpreted as an expected log-odds-ratio between two competing hypotheses, namely, the hypothetical presence or absence of functional selection for a given site. Using the EoS score, we then develop selection criteria by which functionally-important yet non-adjacent sites can be identified. An approximate power analysis is also developed to estimate the reliability of inference given the data. We validate and demonstrate the the practical utility of our method by analysis of the homing endonuclease I-Bmol, comparing our predictions with the results of existing methods.Our method is able to assess both the evidence of selection at individual amino acid sites and estimate the reliability of those inferences. Experimental validation with I-Bmol proves its utility to identify functionally-important residues of poorly characterized proteins, demonstrating increased sensitivity over previous methods without loss of specificity. With the ability to guide the selection of precise experimental mutagenesis conditions, our method helps make unigenic analysis a more broadly applicable technique with which to probe protein function.Software to compute, plot, and summarize EoS data is available as an open-source package called 'unigenic' for the 'R' programming language at http:/ / www.fernandes.org/ txp/ article/ 13/ an-analytical-framework-for-unigeni c-evolution webcite.One of the principal reasons for studying molecular evolution is that the function of a novel protein can be deduced, in part, by comparing it with a similar previously-characterized protein. But what recourse do we have if the novel protein does not exhibit significant sequence similarity to other proteins? More problematically
|