%0 Journal Article %T Development and evaluation of an open source software tool for deidentification of pathology reports %A Bruce A Beckwith %A Rajeshwarri Mahaadevan %A Ulysses J Balis %A Frank Kuo %J BMC Medical Informatics and Decision Making %D 2006 %I BioMed Central %R 10.1186/1472-6947-6-12 %X 1254 (69.7 %) of 1800 pathology reports contained identifiers in the body of the report. 3439 (98.3%) of 3499 unique identifiers in the test set were removed. Only 19 HIPAA-specified identifiers (mainly consult accession numbers and misspelled names) were missed. Of 41 non-HIPAA identifiers missed, the majority were partial institutional addresses and ages. Outside consultation case reports typically contain numerous identifiers and were the most challenging to deidentify comprehensively. There was variation in performance among reports from the three institutions, highlighting the need for site-specific customization, which is easily accomplished with our tool.We have demonstrated that it is possible to create an open-source deidentification program which performs well on free-text pathology reports.The value of studying information contained within the medical record of patients has long been recognized. One of the issues related to using such medical information for research purposes has been protecting patient privacy. Currently, investigators wishing to use medical records for research purposes have three options: obtain permission from the patients, obtain a waiver of informed consent from their Institutional Review Board or use a data set that has had all (de-identified data set) or most (limited data set) of the identifiers removed [1,2]. The Health Insurance Portability and Accountability Act [2] (HIPAA) specifies that a de-identified data set can be created by removal of nineteen specific types of identifiers constitutes deidentification of the medical records (see Table 1). These identifiers include names, ages, dates, addresses, and identifying codes of patients, their relatives, household members and employers.Each year, pathologists in the United States examine millions of tissue samples. This results in the creation and storage of vast numbers of paraffin embedded tissue samples. These specimens have been examined and characterized by a pathologist an %U http://www.biomedcentral.com/1472-6947/6/12