|
- 2016
基于概率图模型的蛋白质推断算法Keywords: 蛋白质推断, 肽段推断, 鸟枪法蛋白质组学, 概率图模型protein inference, peptide inference, shotgun proteomics, probability graph model Abstract: 蛋白质组学是研究细胞内表达的所有的蛋白质及其变化规律的一门新兴学科。蛋白质组学的一个重要目标是能够快速准确的进行蛋白质鉴定。蛋白质鉴定主要包括肽段鉴定和蛋白质推断两个步骤。肽段鉴定是从原始质谱数据中鉴定出肽段序列,而蛋白质推断是从这些鉴定得到的肽段中还原出原始的蛋白质序列。但由于质谱数据固有的不确定性和蛋白质组的复杂性,使得解决蛋白质推断问题变得很困难。本文引入串联质谱数据对于蛋白质存在概率的影响,提出了一种基于概率图模型的方法(PGMPi)来解决蛋白质推断问题,将蛋白质推断问题抽象成一个概率图模型的求解问题,通过寻找蛋白质的最大后验概率来推断真实存在的蛋白质集合。该方法不仅能够进行有效的蛋白质推断,而且模型参数少,提高了算法的稳定性。实验结果表明该模型在蛋白质推断上具有很好的表现。Proteomics is an emerging discipline that focuses on the large-scale study of proteins expressed inan organism. An explicit goal of proteomics is the prompt and accurate identification of all proteins in a cell or tissue. Generally, protein identification can be divided into two parts: peptide identification and protein inference. In peptide identification, the peptide sequence is identified from raw tandem mass spectrometry , while the goal of protein inference is to identify which of these identified proteins is truly present in the sample. Because of the inherent uncertainty of MS data and the complexity of the proteome, there are several challenges in protein identification. In this article, we propose a novel method based on the probabilistic graphical model (PGMPi) that introduces the influence of tandem mass spectrometry. This method transforms the protein inference problem into a probabilistic graphical model problem to be solved, in which the maximum posteriori probabilities of proteins are identified in order to identify the protein set that is actually present in the sample. PGMPi can not only achieve efficient performance in terms of identification, but also introduces only one parameter, which ensures the algorithm’s stability. The experimental results demonstrate that our method is superior to existing state-of-the-art protein inference algorithms
|