|
Systematic analysis of short internal indels and their impact on protein foldingAbstract: We compiled a non-redundant dataset of short internal indels (2-40 amino acids) from highly homologous protein pairs and analyzed the sequence and structural features of the indels. We found that about one third of indel residues are in disordered state and majority of the residues are exposed to solvent, suggesting that these indels are generally located on the surface of proteins. Though naturally occurring indels are fewer than engineered ones in the dataset, there are no statistically significant differences in terms of amino acid frequencies and secondary structure types between the "Natural" indels and "All" indels in the dataset. Structural comparisons show that all the protein pairs with short internal indels in the dataset preserve the structural folds and about 85% of protein pairs have global RMSDs (root mean square deviations) of 2? or less, suggesting that protein structures tend to be conserved and can tolerate short insertions and deletions. A few pairs with high RMSDs are results of relative domain positions of the proteins, probably due to the intrinsically dynamic nature of the proteins.The analysis demonstrated that protein structures have the "plasticity" to tolerate short indels. This study can provide valuable guides in modeling protein AS isoform structures and homologous proteins with indels through placing the indels at the right locations since the accuracy of sequence alignments dictate model qualities in homology modeling.Sequence insertions/deletions (indels) occur during evolution and alternative splicing (AS) process in eukaryotes. The generation of various protein isoforms through alternative splicing has been considered as one of the major evolutionary mechanisms for increasing the proteome size and functional diversity [1,2]. Recent high-throughput analysis based on mRNA-SEQ data from diverse human tissue and cell lines suggested that alternative splicing is almost universal (up to 94%) in human multi-exon genes [3]. While there are
|