Identification of Potential Drug Targets Implicated in Parkinson's Disease from Human Genome: Insights of Using Fused Domains in Hypothetical Proteins as Probes
High-throughput genome sequencing has led to data explosion in sequence databanks, with an imbalance of sequence-structure-function relationships, resulting in a substantial fraction of proteins known as hypothetical proteins. Functions of such proteins can be assigned based on the analysis and characterization of the domains that they are made up of. Domains are basic evolutionary units of proteins and most proteins contain multiple domains. A subset of multidomain proteins is fused domains (overlapping domains), wherein sequence overlaps between two or more domains occur. These fused domains are a result of gene fusion events and their implication in diseases is well established. Hence, an attempt has been made in this paper to identify the fused domain containing hypothetical proteins from human genome homologous to parkinsonian targets present in KEGG database. The results of this research identified 18 hypothetical proteins, with domains fused with ubiquitin domains and having homology with targets present in parkinsonian pathway. 1. Introduction Hypothetical proteins basically are defined as “a protein coded by a gene with no known function based on its DNA sequence” [2]. Certain regions in hypothetical proteins are highly conserved between species in both composition and sequence. Proteins with such regions are annotated as conserved hypothetical proteins and range from 13% in E. coli and 14% in Rickettsia prowazekii to 40% in Pyrococcus abyssi and 47% in Plasmodium falciparum [3]. The human genome too has about 20% of them classified as hypothetical [4–6]. The function of such proteins can be predicted based on the arrangement of distinct domains [7] in them since this arrangement in proteomes reflects the fundamental evolutionary differences in their genomes [8]. But with proteins containing more than one domain, the general function can only be suggested. The difficulty one observes in predicting a protein’s function based on domains alone would be when there are no clear cut boundaries between any two domains. Proteins with appreciable overlap in their domain boundaries are known as fused domain containing proteins or chimeric proteins. Such proteins are formed by the process of gene duplication and combination during evolution. Proteins containing such domains are created by joining two or more genes, which originally code for separate proteins [9]. Translation of this fusion gene results in a single polypeptide with functional properties derived from each of the original proteins [10]. Analysis of these fused domains in related genomes
References
[1]
N. Lev and E. Melamed, “Heredity in Parkinson's disease: new findings,” Israel Medical Association Journal, vol. 3, no. 6, pp. 435–438, 2001.
[2]
M. Y. Galperin, “Conserved 'hypothetical' proteins: new hints and new puzzles,” Comparative and Functional Genomics, vol. 2, no. 1, pp. 14–18, 2001.
[3]
I. Iliopoulos, S. Tsoka, M. A. Andrade et al., “Genome sequences and great expectations,” Genome Biology, vol. 2, no. 1, INTERACTIONS0001, 2001.
[4]
P. Suravajhala, “Hypo, hype and “hyp” human proteins,” Bioinformation, vol. 2, no. 1, pp. 31–33, 2007.
[5]
S. A. Teichmann, C. Chothia, and M. Gerstein, “Advances in structural genomics,” Current Opinion in Structural Biology, vol. 9, no. 3, pp. 390–399, 1999.
[6]
T. C. Terwilliger, G. Waldo, T. S. Peat, J. M. Newman, K. Chu, and J. Berendzen, “Class-directed structure determination: foundation for a protein structure initiative,” Protein Science, vol. 7, no. 9, pp. 1851–1856, 1998.
[7]
C. Vogel, C. Berzuini, M. Bashton, J. Gough, and S. A. Teichmann, “Supra-domains: evolutionary units larger than single protein domains,” Journal of Molecular Biology, vol. 336, no. 3, pp. 809–823, 2004.
[8]
M. Gerstein and H. Hegyi, “Comparing genomes in terms of protein structure: surveys of a finite parts list,” FEMS Microbiology Reviews, vol. 22, no. 4, pp. 277–304, 1998.
[9]
T. Mebatsion, M. J. Schnell, and K. K. Conzelmann, “Mokola virus glycoprotein and chimeric proteins can replace rabies virus glycoprotein in the rescue of infectious defective rabies virus particles,” Journal of Virology, vol. 69, no. 3, pp. 1444–1451, 1995.
[10]
S. D. Lupton, L. L. Brunton, V. A. Kalberg, and R. W. Overell, “Dominant positive and negative selection using a hygromycin phosphotransferase-thymidine kinase fusion gene,” Molecular and Cellular Biology, vol. 11, no. 6, pp. 3374–3378, 1991.
[11]
A. J. Enright, I. Illopoulos, N. C. Kyrpides, and C. A. Ouzounis, “Protein interaction maps for complete genomes based on gene fusion events,” Nature, vol. 402, no. 6757, pp. 86–90, 1999.
[12]
B. C. Mondal, S. Majumdar, U. B. Dasgupta, U. Chaudhuri, P. Chakrabarti, and S. Bhattacharyya, “e19a2 BCR-ABL fusion transcript in typical chronic myeloid leukaemia: a report of two cases,” Journal of Clinical Pathology, vol. 59, no. 10, pp. 1102–1103, 2006.
[13]
K. Truong and M. Ikura, “Domain fusion analysis by applying relational algebra to protein sequence and domain databases,” BMC Bioinformatics, vol. 4, article 16, 2003.
[14]
J. M. Chia and P. R. Kolatkar, “Implications for domain fusion protein-protein interactions based on structural information,” BMC Bioinformatics, vol. 5, article 161, 2004.
[15]
F. J. Giles, J. E. Cortes, and H. M. Kantarjian, “Targeting the kinase activity of the BCR-ABL fusion protein in patients with chronic myeloid-leukemia,” Current Molecular Medicine, vol. 5, no. 7, pp. 615–623, 2005.
[16]
A. R. Mushegian, D. E. Bassett, M. S. Boguski, P. Bork, and E. V. Koonin, “Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs,” Proceedings of the National Academy of Sciences of the United States of America, vol. 94, no. 11, pp. 5831–5836, 1997.
[17]
J. Schultz, F. Milpetz, P. Bork, and C. P. Ponting, “SMART, a simple modular architecture research tool: identification of signaling domains,” Proceedings of the National Academy of Sciences of the United States of America, vol. 95, no. 11, pp. 5857–5864, 1998.
[18]
A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, “SCOP: a structural classification of proteins database for the investigation of sequences and structures,” Journal of Molecular Biology, vol. 247, no. 4, pp. 536–540, 1995.
[19]
M. Wang and G. Caetano-Anollés, “Global phylogeny determined by the combination of protein domains in proteomes,” Molecular Biology and Evolution, vol. 23, no. 12, pp. 2444–2454, 2006.
[20]
A. Marchler-Bauer, J. B. Anderson, P. F. Cherukuri et al., “CDD: a Conserved Domain Database for protein classification,” Nucleic Acids Research, vol. 33, pp. D192–D196, 2005.
[21]
D. Brown and K. Sj?lander, “Functional classification using phylogenomic inference,” PLoS Computational Biology, vol. 2, no. 6, article e77, pp. 479–483, 2006.
[22]
S. R. Eddy, “Hidden markov models,” Current Opinion in Structural Biology, vol. 6, pp. 361–365, 1996.
[23]
P. Shannon, A. Markiel, O. Ozier et al., “Cytoscape: a software environment for integrated models of biomolecular interaction networks,” Genome Research, vol. 13, no. 11, pp. 2498–2504, 2003.
[24]
L. Madsen, A. Schulze, M. Seeger, and R. Hartmann-Petersen, “Ubiquitin domain proteins in disease,” BMC Biochemistry, vol. 8, supplement 1, article S1, 2007.
[25]
A. Samii, A. DePold Hohler, and R. Goodkin, “Functional neurosurgery for movement disorders,” in Neurosurgery, Springer Specialist Surgery Series-XI, pp. 607–616, 2005.
[26]
M. Kanehisa and S. Goto, “KEGG: kyoto encyclopedia of genes and genomes,” Nucleic Acids Research, vol. 28, no. 1, pp. 27–30, 2000.
[27]
N. Hulo, A. Bairoch, V. Bulliard et al., “The 20 years of PROSITE,” Nucleic Acids Research, vol. 36, supplement 1, pp. D245–D249, 2008.
[28]
T. Kawabata, M. Ota, and K. Nishikawa, “The protein mutant database,” Nucleic Acids Research, vol. 27, no. 1, pp. 355–357, 1999.