|
BMC Bioinformatics 2006
A high level interface to SCOP and ASTRAL implemented in PythonAbstract: We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL.The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.Bioinformatics tools often attempt to automatically predict the unknown properties of a given dataset. Manually curated data is therefore important both for training and for benchmarking new approaches to prediction. One database providing such a manual curation of data is SCOP (the Structural Classification Of Proteins) [1]. SCOP categorises all known protein domains in a hierarchy based upon the domain structure. This hierarchy is principally described by Class, Fold, Superfamily and Family. Crucially, the relationships between proteins grouped at the superfamily level may not be apparent from sequence considerations alone. This makes SCOP a valuable resource when examining the performance of algorithms that detect remote sequence relationships [2].The ASTRAL [3] Compendium for Sequence and Structure Analysis complements SCOP. ASTRAL provides sequences and structures for each domain in SCOP, and also provides non-redundant subsets of SCOP with preference given to higher quality structures.The SCOP and ASTRAL databases provide their data in structured files available from the relevant websites. Using this data requires parsing and handling of these files. We present a small and intuitive application programming interface (API) to the SCOP and ASTRAL datasets which allows these databases to be used with a minimum of programming overhead. In the past we have successfully used this API to develop a web-based database of SCOP alignments, S4 [4]. The API described is now distributed as part
|