|
Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factorsAbstract: The new dichotomic system first identifies domains of known structures, followed by assignment of structural domains and ID regions with a combination of pre-existing tools and a newly developed program based on sequence divergence, taking un-aligned regions into consideration. The system was found to be highly accurate: its application to a set of proteins with experimentally verified ID regions had an error rate as low as 2%. Application of this system to human TFs (401 proteins) showed that 38% of the residues were in structural domains, while 62% were in ID regions. The preponderance of ID regions makes a sharp contrast to TFs of Escherichia coli (229 proteins), in which only 5% fell in ID regions. The method also revealed that 4.0% and 11.8% of the total length in human and E. coli TFs, respectively, are comprised of structural domains whose structures have not been determined.The present system verifies that sequence divergence including information of unaligned regions is a good indicator of ID regions. The system for the first time estimates the complete fractioning of structured/un-structured regions in human TFs, also revealing structural domains without homology to known structures. These predicted novel structural domains are good targets of structural genomics. When applied to other proteins, the system is expected to uncover more novel structural domains.Recent studies revealed that a high fraction of proteins in eukaryotes have long stretches of intrinsically disordered (ID) regions [1,2]. Proteins with ID regions, abundant in the cytosol and nucleus but scarce in mitochondria [3], are frequently involved in cellular regulatory processes such as transcription, translation, and cellular signaling transduction [4-7]. The abundance of proteins with ID regions in the cells can be tightly controlled by regulation of transcript clearance, proteolytic degradation, and translational rate[8]. Transcription factors (TFs) such as activators, repressors, or enhan
|