|
A UMLS-based spell checker for natural language processing in vaccine safetyAbstract: We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction.We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively.We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.The Institute of Medicine (IOM) in the United States identified patient safety as a key goal in the delivery of health care. In a 2004 report, "Patient Safety: Achieving a New Standard for Care," the IOM highlighted the importance of pursuing an applied research agenda on patient safety, focused on enhancing knowledge, developing tools, and disseminating results to maximize the impact of patient safety systems [1]. The capacity for using computer technology has grown rapidly in the domain of reporting adverse events following immunizations (AEFI), particularly with its increasin
|