|
Genome Biology 2005
Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactomeAbstract: We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets.These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.The past few years have seen a tremendous development of functional genomics technologies. In particular, the yeast proteome has been the subject of considerable effort, including genome-wide protein interaction assays using yeast two-hybrid technology [1,2], affinity chromatography/mass spectrometry [3,4], synthetic lethal assays [5,6], and genome context methods [7-10]. Success in these areas, even given the limited accuracy of these technologies [11-15], has led to the application of the yeast two-hybrid method for the fly [16] and the worm proteomes [17], providing initial steps toward maps of the fly and worm int
|