|
BMC Systems Biology 2010
Low-complexity regions within protein sequences have position-dependent rolesAbstract: In keeping with previous results, we found that LCR-containing proteins tend to have more binding partners across different PPI networks than proteins that have no LCRs. More specifically, our study suggests i) that LCRs are preferentially positioned towards the protein sequence extremities and, in contrast with centrally-located LCRs, such terminal LCRs show a correlation between their lengths and degrees of connectivity, and ii) that centrally-located LCRs are enriched with transcription-related GO terms, while terminal LCRs are enriched with translation and stress response-related terms.Our results suggest not only that LCRs may be involved in flexible binding associated with specific functions, but also that their positions within a sequence may be important in determining both their binding properties and their biological roles.Low-complexity regions (LCRs) in protein sequences are regions containing little diversity in their amino acid composition. The degree of diversity they exhibit may vary, ranging from regions comprising few different amino acids, to those comprising just one, the amino acid positions within these regions being either loosely clustered, irregularly spaced, or periodic [1]. This work defines LCRs computationally as an amino acid sequence with low information content (see methods). Therefore, simple repetitive sequences such as tandem amino acid repeats form part of the LCR dataset discussed here.LCRs are common in protein sequences, but precise measures of their abundance are difficult to ascertain. One of the problems is that the degrees of stringency applied by different detection methods differ, leading to different estimates of the numbers of LCRs in the same dataset. Importantly also, our knowledge of the protein universe has changed dramatically during the last 15 years, as protein sequence repositories have become engorged with the outputs of high-throughput sequencing projects. Protein sequence databases have thus grown enormously
|