OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

中国科学生命科学 2015

基于质谱技术筛选差异表达蛋白的统计学策略研究进展

DOI: 10.1360/N052014-00197, PP. 347-358

王锦霞, 常乘, 马洁, 吴松锋, 庄举娟, 朱云平

Keywords: 质谱,蛋白质组学,差异表达蛋白,统计学原理,多重假设检验

Full-Text Cite this paper Add to My Lib

Abstract:

随着质谱技术的快速发展,蛋白质组学已成为继基因组学、转录组学之后的又一研究热点,寻找可靠的差异表达蛋白对于生物标记物的发现至关重要.因此,如何准确、灵敏地筛选出差异蛋白已成为基于质谱的定量蛋白质组学的主要研究内容之一.目前,针对该问题的研究方法众多,但这些方法策略的适用范围不尽相同.总体来说,基于质谱技术筛选差异蛋白的统计学策略可以分为3类:基于经典统计学派的策略、基于贝叶斯学派的统计检验策略和其他策略,这3类方法有各自的应用范围、特点及不足.此外,筛选过程还将产生部分假阳性结果,可以采用其他方法对差异表达蛋白的质量进行控制,以提高统计检验结果的可靠性.

References

[1]	1 Werner T. Promoters can contribute to the elucidation of protein function. Trends Biotechnol, 2003, 21: 9-13
[2]	2 Geiger T, Wehner A, Schaab C, et al. Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol Cell Proteomics, 2012, 11: M111.014050
[3]	3 Bellew M, Coram M, Fitzgibbon M, et al. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics, 2006, 22: 1902-1909
[4]	4 张尧庭. 统计中的三大学派. 统计教育, 1995, 1: 35-39
[5]	5 Student. On the error of counting with a haemacytometer. Biometrika, 1907, 5: 351-360
[6]	6 Roxas BA, Li Q. Significance analysis of microarray for relative quantitation of LC/MS data in proteomics. BMC Bioinformatics, 2008, 9: 187
[7]	7 Andreev V P, Li L, Rejtar T, et al. New algorithm for 15N/14N quantitation with LC-ESI-MS using an LTQ-FT mass spectrometer. J Proteome Res, 2006, 5: 2039-2045
[8]	8 Wu C C, MacCoss M J, Howell K E, et al. Metabolic labeling of mammalian organisms with stable isotopes for quantitative proteomic analysis. Anal Chem, 2004, 76: 4951-4959
[9]	9 单文娟, 童春发, 施季森. 基因芯片筛选差异表达基因方法比较. 遗传, 2008, 30: 1640-1646
[10]	10 Tusher V G, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA, 2001, 98: 5116-5121
[11]	11 Jain N, Thatte J, Braciale T, et al. Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 2003, 19: 1945-1951
[12]	12 Allet N, Barrillat N, Baussant T, et al. In vitro and in silico processes to identify differentially expressed proteins. Proteomics, 2004, 4: 2333-2351
[13]	13 Cho H, Smalley D M, Theodorescu D, et al. Statistical identification of differentially labeled peptides from liquid chromatography tandem mass spectrometry. Proteomics, 2007, 7: 3681-3692
[14]	14 Clough T, Thaminy S, Ragg S, et al. Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs. BMC Bioinformatics, 2012, 13: S6
[15]	15 Cameron A C, Trivedi P K. Regression Analysis of Count Data. Cambridge: Cambridge University Press, 1998
[16]	16 Leitch M C, Mitra I, Sadygov R G. Generalized linear and mixed models for label-free shotgun proteomics. Stat Interface, 2012, 5: 89-98
[17]	17 Li M, Gray W, Zhang H, et al. Comparative shotgun proteomics using spectral count data and quasi-likelihood modeling. J Proteome Res, 2010, 9: 4295-4305
[18]	18 Pham T V, Piersma S R, Warmoes M, et al. On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics. Bioinformatics, 2010, 26: 363-369
[19]	19 Pham T V, Jimenez C R. An accurate paired sample test for count data. Bioinformatics, 2012, 28: I596-I602
[20]	20 Fisher R. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd, 1934
[21]	21 Sokal R R, Rohlf F J. Biometry: the Principles and Practice of Statistics in Biological Research. 3rd ed. New York: W. H. Freeman and Company, 1995
[22]	22 Audic S, Claverie J M. The significance of digital gene expression profiles. Genome Res, 1997, 7: 986-995
[23]	23 Taverner T, Karpievitch Y V, Polpitiya A D, et al. Danter: an extensible R-based tool for quantitative analysis of -omics data. Bioinformatics, 2012, 28: 2404-2406
[24]	24 Zhang B, VerBerkmoes N C, Langston M A, et al. Detecting differential and correlated protein expression in label-free shotgun proteomics. J Proteome Res, 2006, 5: 2909-2918
[25]	25 Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B, 1995, 57: 289-300
[26]	26 Cooper B, Feng J, Garrett W M. Relative, label-free protein quantitation: spectral counting error statistics from nine replicate mudpit samples. J Am Soc Mass Spectrom, 2010, 21: 1534-1546
[27]	27 Mann M. Comparative analysis to guide quality improvements in proteomics. Nat Methods, 2009, 6: 717-719
[28]	28 Troyanskaya O G, Garber M E, Brown P O, et al. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics, 2002, 18: 1454-1461
[29]	29 Efron B. Microarrays, empirical bayes and the two-groups model. Stat Sci, 2008, 23: 1-22
[30]	30 Margolin A A, Ong S E, Schenone M, et al. Empirical bayes analysis of quantitative proteomics experiments. PLoS One, 2009, 4: e7454
[31]	31 Ong S E, Blagoev B, Kratchmarova I, et al. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics, 2002, 1: 376-386
[32]	32 Koopmans F, Cornelisse L N, Heskes T, et al. Empirical bayesian random censoring threshold model improves detection of differentially abundant proteins. J Proteome Res, 2014, 13: 3871-3880
[33]	33 Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 2001, 17: 520-525
[34]	34 Candes E J, Plan Y. Matrix completion with noise. Proc IEEE, 2010, 98: 925-936
[35]	35 Choi H, Fermin D, Nesvizhskii A I. Significance analysis of spectral count data in label-free shotgun proteomics. Mol Cell Proteomics, 2008, 7: 2373-2385
[36]	36 Booth J G, Eilertson K E, Olinares P D, et al. A bayesian mixture model for comparative spectral count data in shotgun proteomics. Mol Cell Proteomics, 2011, 10: M110.007203
[37]	37 Serang O, Paulo J, Steen H, et al. A non-parametric cutout index for robust evaluation of identified proteins. Mol Cell Proteomics, 2013, 12: 807-812
[38]	38 Serang O, Cansizoglu A E, Kall L, et al. Nonparametric bayesian evaluation of differential protein quantification. J Proteome Res, 2013, 12: 4556-4565
[39]	39 Pavelka N, Pelizzola M, Vizzardelli C, et al. A power law global error model for the identification of differentially expressed genes in microarray data. BMC Bioinformatics, 2004, 5: 203
[40]	40 Pavelka N, Fournier M L, Swanson S K, et al. Statistical similarities between transcriptomics and quantitative shotgun proteomics data. Mol Cell Proteomics, 2008, 7: 631-644
[41]	41 Li Q, Roxas B A. An assessment of false discovery rates and statistical significance in label-free quantitative proteomics with combined filters. BMC Bioinformatics, 2009, 10: 43
[42]	42 Webb-Robertson B J, McCue L A, Waters K M, et al. Combined statistical analyses of peptide intensities and peptide occurrences improves identification of significant peptides from ms-based proteomics data. J Proteome Res, 2010, 9: 5748-5756
[43]	43 孙薇, 贺福初. 差异蛋白质组学研究技术新进展. 化学通报, 2005, 68: 401-407
[44]	44 Karpievitch Y, Stanley J, Taverner T, et al. A statistical framework for protein quantitation in bottom-up MS-based proteomics. Bioinformatics, 2009, 25: 2028-2034
[45]	45 Wang X, Anderson G A, Smith R D, et al. A hybrid approach to protein differential expression in mass spectrometry-based proteomics. Bioinformatics, 2012, 28: 1586-1591
[46]	46 Xu J, Wang L, Li J. A biological network module-based model for the analysis of differential expression in shotgun proteomics. J Proteome Res, 2014, doi: 10.1021/pr5007203
[47]	47 Pawitan Y, Murthy K R, Michiels S, et al. Bias in the estimation of false discovery rate in microarray studies. Bioinformatics, 2005, 21: 3865-3872
[48]	48 Shaffer J P. Multiple hypothesis testing. Annu Rev Psychol, 1995, 46: 561-584
[49]	49 Storey J D. The positive false discovery rate: a bayesian interpretation and the q-value. Ann Stat, 2003, 31: 2013-2035
[50]	50 Dudoit S, Yang Y H, Callow M J, et al. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sinica, 2002, 12: 111-140
[51]	51 Fu Y, Qian X. Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry. Mol Cell Proteomics, 2013, 13: 1359-1368
[52]	52 Elias J E, Gygi S P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods, 2007, 4: 207-214
[53]	53 Tan Y D, Xu H. A general method for accurate estimation of false discovery rates in identification of differentially expressed genes. Bioinformatics, 2014, 30: 2018-2025
[54]	54 Farcomeni A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat Methods Med Res, 2008, 17: 347-388
[55]	55 Efron B. Large-scale simultaneous hypothesis testing. J Am Stat Assoc, 2004, 99: 96-104
[56]	56 Storey J D, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA, 2003, 100: 9440-9445
[57]	57 Bei Y, Hong P. A novel approach to minimize false discovery rate in genome-wide data analysis. BMC Syst Biol, 2013, 7: S1
[58]	58 Noble W S. How does multiple testing correction work? Nat Biotechnol, 2009, 27: 1135-1137
[59]	59 Sham P C, Purcell S M. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet, 2014, 15: 335-346
[60]	60 Krzywinski M, Altman N. Points of significance: importance of being uncertain. Nat Methods, 2013, 10: 1041-1042
[61]	61 Carvalho P C, Fischer J S, Chen EI, et al. Patternlab for proteomics: a tool for differential shotgun proteomics. BMC Bioinformatics, 2008, 9: 316
[62]	62 Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol, 2008, 26: 1367-1372
[63]	63 Chang C, Zhang J, Han M, et al. Silver: an efficient tool for stable isotope labeling LC-MS data quantitative analysis with quality control methods. Bioinformatics, 2014, 30: 586-587
[64]	64 Lin W T, Hung W N, Yian Y H, et al. Multi-q: a fully automated tool for multiplexed protein quantitation. J Proteome Res, 2006, 5: 2328-2338
[65]	65 Little K M, Lee J K, Ley K. ReSASC: a resampling-based algorithm to determine differential protein expression from spectral count data. Proteomics, 2010, 10: 1212-1222
[66]	66 Carvalho P C, Yates J R 3rd, Barbosa V C. Improving the TFold test for differential shotgun proteomics. Bioinformatics, 2012, 28: 1652-1654
[67]	67 Elo L L, Hiissa J, Tuimala J, et al. Optimized detection of differential expression in global profiling experiments: case studies in clinical transcriptomic and quantitative proteomic datasets. Brief Bioinform, 2009, 10: 547-555
[68]	68 Heinecke N, Pratt B, Vaisar T, et al. PepC: proteomics software for identifying differentially expressed proteins based on spectral counting. Bioinformatics, 2010, 26: 1574-1575

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133