|
BMC Bioinformatics 2008
A nonparametric model for quality control of database search results in shotgun proteomicsAbstract: In this paper, a multivariate nonlinear discriminate function (DF) based on the multivariate nonparametric density estimation technique was used to filter out false-positive database search results with a predictable false positive rate (FPR). Application of this method to control datasets of different instruments (LCQ, LTQ, and LTQ/FT) yielded an estimated FPR close to the actual FPR. As expected, the method was more sensitive when more features were used. Furthermore, the new method was shown to be more sensitive than two commonly used methods on 3 complex sample datasets and 3 control datasets.Using the nonparametric model, a more flexible DF can be obtained, resulting in improved sensitivity and good FPR estimation. This nonparametric statistical technique is a powerful tool for tackling the complexity and diversity of datasets in shotgun proteomics.The objective of proteomics is to investigate proteins on a global scale [1,2]. The high-throughput and sensitive tandem mass spectrometry (MS/MS) platform is now a supporting technology for protein identification in proteomic research [3,4]. Using the shotgun strategy, a large number of MS/MS spectra can be gathered in a few hours [5]. The MS/MS data is generally processed by the so-called database searching method [5]. Automated software such as SEQUEST [6] and MASCOT [7] can rapidly assign tryptic peptides to MS/MS spectra by searching a protein sequence database and then identify proteins by utilizing the identified peptides. A notable problem in the MS/MS data processing is the high false positive rate (FPR) of the database search results [8]. Thus, validation of database search results is unavoidable and necessary work, particularly when processing the large amount low accuracy MS/MS spectra with SEQUEST [10].There are many proposed parameters and algorithms for evaluating SEQUEST database search results [11-32]. Such approaches must confront two main problems: First, the complex physical and chemical mechanism
|