|
Scoring function to predict solubility mutagenesisAbstract: We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%.Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html webcite.Correlations between sequence and structure influence to a large extent how proteins fold, and also how they function. Working under this premise, most computational methods used for predicting various aspects of structure and function employ scoring functions, which quantify the propensities of groups of amino acids to form specific structural or functional units. Scoring functions for mutagenesis predict the effects of changing one or more amino acids (AAs) on critical properties such as stability [1-4] or activity [5], solubility [6], etc. In experimental mutagenesis, one is often faced with the challenge of having to select a small subset from a large set of candidate mutations. Computational methods are invaluable for making such choices without generating all the mutants in the lab.Mo
|