|
BMC Bioinformatics 2007
A mass accuracy sensitive probability based scoring algorithm for database searching of tandem mass spectrometry dataAbstract: A probability based statistical scoring model for assessing peptide and protein matches in tandem MS database search was derived. The statistical scores in the model represent the probability that a peptide match is a random occurrence based on the number or the total abundance of matched product ions in the experimental spectrum. The model also calculates probability based scores to assess protein matches. Thus the protein scores in the model reflect the significance of protein matches and can be used to differentiate true from random protein matches.The model is sensitive to high mass accuracy and implicitly takes mass accuracy into account during scoring. High mass accuracy will not only reduce false positives, but also improves the scores of true positive matches. The algorithm is incorporated in an automated database search program MassMatrix.Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has become one of the most used tools in mass spectrometry based proteomics [1]. In shotgun proteomics, peptides are separated using liquid chromatography and introduced into a mass spectrometer via an ionization interface. In tandem mass spectrometry, the peptide precursor ions are isolated and fragmented via collision-induced dissociation (CID) [2] with inert gas, electron capture dissociation (ECD) [3], surface induced dissociation (SID) [4] and/or electron transfer dissociation (ETD) [5]. The resulting tandem MS spectra contain product ion signatures that relate back to the identity of the peptide precursor ions [2,6,7].Various algorithms have since been developed to automate the process for modern high-throughput LC-MS/MS experiments. These algorithms fall under two categories: de novo sequence inference and database searching [8]. The first approach identifies peptide sequences directly from the tandem MS data [9,10]. This type of algorithm is usually computationally expensive and limited by the mass accuracy of the tandem MS data [8]. The databas
|