Allergy is a major health problem in industrialized countries. The number of transgenic food crops is growing rapidly creating the need for allergenicity assessment before they are introduced into human food chain. While existing bioinformatic methods have achieved good accuracies for highly conserved sequences, the discrimination of allergens and non-allergens from allergen-like non-allergen sequences remains difficult. We describe AllerHunter, a web-based computational system for the assessment of potential allergenicity and allergic cross-reactivity in proteins. It combines an iterative pairwise sequence similarity encoding scheme with SVM as the discriminating engine. The pairwise vectorization framework allows the system to model essential features in allergens that are involved in cross-reactivity, but not limited to distinct sets of physicochemical properties. The system was rigorously trained and tested using 1,356 known allergen and 13,449 putative non-allergen sequences. Extensive testing was performed for validation of the prediction models. The system is effective for distinguishing allergens and non-allergens from allergen-like non-allergen sequences. Testing results showed that AllerHunter, with a sensitivity of 83.4% and specificity of 96.4% (accuracy = 95.3%, area under the receiver operating characteristic curve AROC = 0.928±0.004 and Matthew's correlation coefficient MCC = 0.738), performs significantly better than a number of existing methods using an independent dataset of 1443 protein sequences. AllerHunter is available at http://tiger.dbs.nus.edu.sg/AllerHunter
References
[1]
Casolaro V, Georas SN, Song Z, Ono SJ (1996) Biology and genetics of atopic disease. Curr Opin Immunol 8: 796–803.
[2]
Sampson H (2004) Update on food allergy. J Allergy Clin Immunol 113: 805–819.
[3]
Cox HE (1999) Clinical and genetic aspects of atopic dermatitis. London, UK: University of London.
[4]
Williams H, Robertson C, Stewart A, A?t-Khaled N, Anabwani G, et al. (1999) Worldwide variations in the prevalence of symptoms of atopic eczema in the international study of asthma and allergies in childhood. J Allergy Clin Immunol 103: 125–138.
[5]
Sutton BJ, Gould HJ (1993) The human IgE network. Nature 366: 421–428.
[6]
Goodman RE, Hefle SL (2005) Assessing genetically modified crops to minimize the risk of increased food allergy: a review. Int Arch Allergy Immunol 137: 153–166.
[7]
Lee YH, Sinko PJ (2000) Oral delivery of salmon calcitonin. Adv Drug Deliv Rev 42: 225–238.
[8]
FAO/WHO (2003) Codex Principles and Guidelines on Foods Derived from Biotechnology.
[9]
Fiers MW, Kleter GA, Nijland H, Peijnenburg AA, Nap JP, et al. (2004) Allermatch, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines. BMC Bioinformatics 5: 133.
[10]
Silvanovich A, Nemeth MA, Song P, Herman R, Tagliani L, et al. (2006) The value of short amino acid sequence matches for prediction of protein allergenicity. Toxicol Sci 90: 252–258.
[11]
Zorzet A, Gustafsson M, Hammerling U (2002) Prediction of food protein allergenicity: a bioinformatic learning systems approach. In Silico Biol 2: 525–534.
[12]
Soeria-Atmadja D, Zorzet A, Gustafsson MG, Hammerling U (2004) Statistical evaluation of local alignment features predicting allergenicity using supervised classification algorithms. Int Arch Allergy Immunol 133: 101–112.
[13]
Li KB, Isaac P, Krishnan P (2004) Predicting allergenic proteins using wavelet transform. Bioinformatic 20: 2572–2578.
[14]
Bj?rklund AK, Soeria-Atmadja D, Zorzet A, Hammerling U, Gustafsson MG (2005) Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins. Bioinformatic 21: 39–50.
[15]
Cui J, Han LY, Lin HH, Zhang HL, Tang ZQ, et al. (2007) Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. Mol Immunol 44: 514–520.
[16]
Saha S, Raghava GPS (2006) AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res 34: W202–W209.
Liao L, Noble WS (2003) Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships. J Comp Biol 10: 857–868.
[19]
Chua HN, Sung W-K (2005) A better gap penalty for pairwise SVM. Proc APBC 11–21.
[20]
Dennis AB, Ilene KM, David JL, James O, David LW (2005) Genbank. Nucleic Acid Res 33: D34–D38.
[21]
O'Donovan C, Martin MJ, Gattiker A, Gasteiger E, Bairoch A, et al. (2002) High-quality protein knowledge resource: SWISS-PROT, TrEMBL. Brief Bioinform 3: 275–284.
[22]
Mari A, Riccioli D (2005) Allergome – a database of allergenic molecules: structure and data implementations of a web-based resource. J Allergy Clin Immunol 115: S87.
[23]
Hileman RE, Silvanovich A, Goodman RE, Rice EA, Holleschak G, et al. (2002) Bioinformatic methods for allergenicity assessment using a comprehensive allergen database. Int Arch Allergy Immunol 128: 280–291.
[24]
Ivanciuc O, Schein CH, Braun W (2003) SDAP: database and computational tools for allergenic proteins. Nucleic Acids Res 31: 359–362.
[25]
Hoffman D, Lowenstein H, Marsh DG, Platts-Mills T, Thomas W (1994) Allergen nomenclature. Bull World Health Organ 72: 796–806.
[26]
Bateman A, Coin L, Durbin R, Finn RD, Hollich1 V, et al. (2000) The Pfam families database. Nucleic Acids Res 28: 263–266.
[27]
Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147: 195–197.
[28]
Altschul SF, Gish W (1996) Local alignment statistics. Methods Enzymol 266: 460–480.
[29]
Vapnik VN (1995) The Nature of Statistical Learning Theory. Springer: New York.
[30]
Vapnik VN (1998) Statistical Learning Theory. Adaptive and learning systems for signal processing, communications, and control. New York: Wiley.
[31]
Joachims T (1999) Making large-scale SVM learning particle. In: Scholkopf B, editor. Advances in Kernel Methods Support Vector Learning. Cambridge, MA and London: MIT Press. pp. 42–56.
[32]
Chang CC, Lin CJ (2004) LIBSVM: a library for support vector machines. Taiwan: National Taiwan University, Department of Computer Science and Information Engineering.
[33]
Stadler MB, Stadler BM (2003) Allergenicity prediction by protein sequence. FASEB J 17: 1141–1143.
[34]
Zhang ZH, Tan SCC, Koh JLY, Falus A, Brusic V (2006) ALLERDB database and integrated bioinformatic tools for assessment of allergenicity and allergic cross-reactivity. Cell Immunol 244: 90–96.
[35]
Tong JC, Tammi MT (2008) Methods and protocols for the assessment of protein allergenicity and cross-reactivity. Front Biosci 13: 4882–4888.
[36]
Brusic V, Petrovsky N, Gendel SM, Millot M, Gigonzac O, et al. (2003) Computational tools for the study of allergens. Allergy 58: 1083–1092.