全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2012 

Multi-Label Multi-Kernel Transfer Learning for Human Protein Subcellular Localization

DOI: 10.1371/journal.pone.0037716

Full-Text   Cite this paper   Add to My Lib

Abstract:

Recent years have witnessed much progress in computational modelling for protein subcellular localization. However, the existing sequence-based predictive models demonstrate moderate or unsatisfactory performance, and the gene ontology (GO) based models may take the risk of performance overestimation for novel proteins. Furthermore, many human proteins have multiple subcellular locations, which renders the computational modelling more complicated. Up to the present, there are far few researches specialized for predicting the subcellular localization of human proteins that may reside in multiple cellular compartments. In this paper, we propose a multi-label multi-kernel transfer learning model for human protein subcellular localization (MLMK-TLM). MLMK-TLM proposes a multi-label confusion matrix, formally formulates three multi-labelling performance measures and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which to further extends our published work GO-TLM (gene ontology based transfer learning model for protein subcellular localization) and MK-TLM (multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization) for multiplex human protein subcellular localization. With the advantages of proper homolog knowledge transfer, comprehensive survey of model performance for novel protein and multi-labelling capability, MLMK-TLM will gain more practical applicability. The experiments on human protein benchmark dataset show that MLMK-TLM significantly outperforms the baseline model and demonstrates good multi-labelling ability for novel human proteins. Some findings (predictions) are validated by the latest Swiss-Prot database. The software can be freely downloaded at http://soft.synu.edu.cn/upload/msy.rar.

References

[1]  Chou KC, Shen HB (2008) Cell-PLoc: A package of Web servers for predicting subcellular localization of proteins in various organisms. Nature Protocols 3: 153–162.
[2]  Chou KC, Shen HB (2006) Hum-PLoc: A novel ensemble classifier for predicting human protein subcellular localization. Biochemical and Biophysical Research Communications 347: 150–157.
[3]  Garg A, Bhasin M, Raghava GP (2005) Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 280: 14427–14432.
[4]  Shen HB, Chou KC (2007) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Bio-phys Res Commun 355: 1006–1011.
[5]  Shen HB, Chou KC (2009) A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Analytical Biochemistry 394: 269–274.
[6]  Hoglund A, Donnes P, Blum T, Adolph H, Kohlbacher O (2006) MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 22(10): 1158–1165.
[7]  Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277: 45765–45769.
[8]  Mak M, Guo J, Kung S (2008) PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5(3): 416–422.
[9]  Pierleoni A, Luigi P, Fariselli P, Casadio R (2006) BaCelLo: a balanced localization predictor. Bioinformatics 22(14): e408–e416.
[10]  Mei S, Wang Fei (2010) Amino acid classification based spectrum kernel fusion for protein subnuclear localization. BMC Bioinformatics 11(Suppl 1): S17.
[11]  Dijk A, Bosch D, Braak C, Krol A, Ham R (2008) Predicting sub-Golgi localization of type II membrane proteins. Bioinformatics 24(16): 1779–1786.
[12]  Barrell D, Dimmer E, Huntley RP, Binns D, Donovan C, et al. (2009) The GOA database in 2009-an integrated Gene Ontology Annotation resource. Nucleic Acids Research 37: D396–D403.
[13]  Boeckmann B, et al. (2003) The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL. Nucleic Acids Research 31: 365–370.
[14]  Shen HB, Yanq J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33: 57–67.
[15]  Shen HB, Chou KC (2010) Virus-mPLoc: A Fusion Classifier for Viral Protein Subcellular Location Prediction by Incorporating Multiple Sites. Journal of Biomolecular Structure & Dynamics 28: 0739–1102.
[16]  Chou KC, Cai Y (2003) A new hybrid approach to predict subcellular localization of proteins by incorporating Gene Ontology. Biochem Biophys Res Commun 311: 743–747.
[17]  Chou KC, Shen HB (2007) Large-Scale Plant Protein Subcellular Location Prediction. Journal of Cellular Biochemistry 100: 665–678.
[18]  Chou KC, Cai Y (2004) Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun 320: 1236–1239.
[19]  Chou KC, Shen HB (2007) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6: 1728–1734.
[20]  Blum T, Briesemeister S, Kohlbacher O (2009) MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10: 274.
[21]  Tung T, Lee D (2009) A method to improve protein subcellular localization prediction by integrating various biological data sources. BMC Bioinformatics 10(Suppl 1): S43.
[22]  Lee K, Chuang H, Beyer A, Sung M, Huh W, et al. (2008) Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species. Nucleic Acids Research 36(20): e136.
[23]  Huang W, Tunq C, Ho S, Hwang S, Ho S (2008) ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization. BMC Bioinformatics 9: 80.
[24]  Huang W, Tung C, Huang S, Ho S (2009) Predicting protein subnuclear localization using GO-amino-acid composition features. Biosystems 98(2): 73–9.
[25]  Mei S, Wang F, Zhou S (2011) Gene ontology based transfer learning for protein subcellular localization. BMC Bioinformatics 12: 44.
[26]  Mei S (2012) Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization. Journal of Theoretical Biology 293: 121–130.
[27]  Chou KC, Wu ZC, Xiao X (2011) iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins. PLoS One 6: e18258.
[28]  Xiao X, Wu ZC, Chou KC (2011) iLoc-Virus: A multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J Theor Biol 284: 42–51.
[29]  Xiao X, Wu ZC, Chou KC (2011) A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One 6: e20592.
[30]  Chou KC, Shen HB (2010) Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization. PLoS ONE 5: e11335.
[31]  Rajendran L, Kn?lker H, Simons K (2010) Subcellular targeting strategies for drug design and delivery. Nature Reviews Drug Discovery 9: 29–42.
[32]  Pan S, Yang Q (2010) A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22(10): 1345–1359.
[33]  Tu Y, tolovitzky G, Klein U (2002) Quantitative noise analysis for gene expression microarray experiments. PNAS 99(22): 14031–14036.
[34]  Dai W, Yang Q, Xue G, Yu Y (2007) Boosting for Transfer Learning. Proceedings of the 24th International Conference on Machine Learning.
[35]  Dai W, Chen Y, Xue G, Yang Q, Yu Y (2008) Translated Learning: Transfer Learning across Different Feature Spaces. Advances in Neural Information Processing Systems (NIPS) 21:
[36]  Yang Q, Chen Y, Xue G, Dai W, Yu Y (2009) Heterogeneous Transfer Learning for Image Clustering via the Social Web. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP 2009, pages 1–9.
[37]  Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research 25: 3389–3402.
[38]  Wu T, Lin C, Weng R (2004) Probability Estimates for Multi-class Classification by Pairwise Coupling. Journal of Machine Learning Research 5: 975–1005.
[39]  Shen HB, Chou KC (2010) Gneg-mPLoc: A top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins. Journal of Theoretical Biology 264: 326–333.
[40]  Zhu L, Yang J, Shen HB (2009) Multi Label Learning for Prediction of Human Protein Subcellular Localizations. Protein J 28: 384–390.
[41]  Platt J (1999) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers. MIT Press.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133