全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Chapter 13: Mining Electronic Health Records in the Genomics Era

DOI: 10.1371/journal.pcbi.1002823

Full-Text   Cite this paper   Add to My Lib

Abstract:

Abstract: The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This chapter reviews several examples of phenotype extraction and their application to genetic research, demonstrating a viable future for genomic discovery using EHR-linked data.

References

[1]  Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106: 9362–9367 doi:10.1073/pnas.0903103106.
[2]  Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
[3]  Dehghan A, K?ttgen A, Yang Q, Hwang S-J, Kao WL, et al. (2008) Association of three genetic loci with uric acid concentration and risk of gout: a genome-wide association study. Lancet 372: 1953–1961 doi:10.1016/S0140-6736(08)61343-4.
[4]  Benjamin EJ, Dupuis J, Larson MG, Lunetta KL, Booth SL, et al. (2007) Genome-wide association with select biomarker traits in the Framingham Heart Study. BMC Med Genet 8 Suppl 1: S11 doi:10.1186/1471-2350-8-S1-S11.
[5]  Kiel DP, Demissie S, Dupuis J, Lunetta KL, Murabito JM, et al. (2007) Genome-wide association with bone mass and geometry in the Framingham Heart Study. BMC Med Genet 8 Suppl 1: S14. doi: 10.1186/1471-2350-8-s1-s14
[6]  Kohane IS (2011) Using electronic health records to drive discovery in disease genomics. Nat Rev Genet 12: 417–428 doi:10.1038/nrg2999.
[7]  Manolio TA (2009) Collaborative genome-wide association studies of diverse diseases: programs of the NHGRI's office of population genomics. Pharmacogenomics 10: 235–241. doi: 10.2217/14622416.10.2.235
[8]  Kaiser Permanente, UCSF Scientists Complete NIH-Funded Genomics Project Involving 100,000 People (n.d.). Available: http://www.dor.kaiser.org/external/news/?press_releases/Kaiser_Permanente,_UCSF_S?cientists_Complete_NIH-Funded_Genomics_P?roject_Involving_100,000_People/. Accessed 13 September 2011.
[9]  Herzig SJ, Howell MD, Ngo LH, Marcantonio ER (2009) Acid-suppressive medication use and the risk for hospital-acquired pneumonia. Jama 301: 2120–2128. doi: 10.1001/jama.2009.722
[10]  Klompas M, Haney G, Church D, Lazarus R, Hou X, et al. (2008) Automated identification of acute hepatitis B using electronic medical record data to facilitate public health surveillance. PLoS ONE 3: e2626 doi:10.1371/journal.pone.0002626.
[11]  Kiyota Y, Schneeweiss S, Glynn RJ, Cannuscio CC, Avorn J, et al. (2004) Accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value on the basis of review of hospital records. American heart journal 148: 99–104. doi: 10.1016/j.ahj.2004.02.013
[12]  Dean BB, Lam J, Natoli JL, Butler Q, Aguilar D, et al. (2009) Use of Electronic Medical Records for Health Outcomes Research: A Literature Review. Med Care Res Rev Available: http://www.ncbi.nlm.nih.gov/entrez/query?.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati?on&list_uids=19279318.
[13]  Elixhauser A, Steiner C, Harris DR, Coffey RM (1998) Comorbidity measures for use with administrative data. Medical care 36: 8–27. doi: 10.1097/00005650-199801000-00004
[14]  Charlson ME, Pompei P, Ales KL, MacKenzie CR (1987) A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of chronic diseases 40: 373–383. doi: 10.1016/0021-9681(87)90171-8
[15]  Li L, Chase HS, Patel CO, Friedman C, Weng C (2008) Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA. Annual Symposium proceedings/AMIA Symposium 404–408.
[16]  Elkin PL, Ruggieri AP, Brown SH, Buntrock J, Bauer BA, et al. (2001) A randomized controlled trial of the accuracy of clinical record retrieval using SNOMED-RT as compared with ICD9-CM. Proceedings/AMIA. Annual Symposium 159–163.
[17]  Ritchie MD, Denny JC, Crawford DC, Ramirez AH, Weiner JB, et al. (2010) Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am J Hum Genet 86: 560–572 doi:10.1016/j.ajhg.2010.03.003.
[18]  Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, et al. (2010) Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken) 62: 1120–1127 doi:10.1002/acr.20184.
[19]  Conway M, Berg RL, Carrell D, Denny JC, Kho AN, et al. (2011) Analyzing the heterogeneity and complexity of electronic health record oriented phenotyping algorithms. AMIA Annu Symp Proc 2011: 274–283.
[20]  Denny JC, Peterson JF, Choma NN, Xu H, Miller RA, et al. (2010) Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J Am Med Inform Assoc 17: 383–388 doi:10.1136/jamia.2010.004804.
[21]  Huff SM, Rocha RA, McDonald CJ, De Moor GJ, Fiers T, et al. (1998) Development of the Logical Observation Identifier Names and Codes (LOINC) vocabulary. J Am Med Inform Assoc 5: 276–292. doi: 10.1136/jamia.1998.0050276
[22]  Logical Observation Identifiers Names and Codes (2007). Available: http://www.regenstrief.org/medinformatic?s/loinc/.
[23]  Kullo IJ, Ding K, Jouni H, Smith CY, Chute CG (2010) A genome-wide association study of red blood cell traits using the electronic medical record. PLoS ONE 5: e13011 doi:10.1371/journal.pone.0013011.
[24]  Rosenbloom ST, Stead WW, Denny JC, Giuse D, Lorenzi NM, et al. (2010) Generating Clinical Notes for Electronic Health Record Systems. Appl Clin Inform 1: 232–243 doi:10.4338/ACI-2010-03-RA-0019.
[25]  Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW, et al. (2011) Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc 18: 181–186 doi:10.1136/jamia.2010.007237.
[26]  Rasmussen LV, Peissig PL, McCarty CA, Starren J (2012) Development of an optical character recognition pipeline for handwritten form fields from an electronic health record. Journal of the American Medical Informatics Association: JAMIA 19: e90–e95 doi:10.1136/amiajnl-2011-000182.
[27]  Peissig PL, Rasmussen LV, Berg RL, Linneman JG, McCarty CA, et al. (2012) Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc 19: 225–234 doi:10.1136/amiajnl-2011-000456.
[28]  Denny JC, Spickard A, Miller RA, Schildcrout J, Darbar D, et al. (2005) Identifying UMLS concepts from ECG Impressions using KnowledgeMap. AMIA. Annual Symposium proceedings/AMIA Symposium 196–200.
[29]  Willems JL, Abreu-Lima C, Arnaud P, van Bemmel JH, Brohet C, et al. (1991) The diagnostic performance of computer programs for the interpretation of electrocardiograms. The New England journal of medicine 325: 1767–1773. doi: 10.1056/nejm199112193252503
[30]  Poon EG, Keohane CA, Yoon CS, Ditmore M, Bane A, et al. (2010) Effect of bar-code technology on the safety of medication administration. N Engl J Med 362: 1698–1707 doi:10.1056/NEJMsa0907115.
[31]  FitzHenry F, Peterson JF, Arrieta M, Waitman LR, Schildcrout JS, et al. (2007) Medication administration discrepancies persist despite electronic ordering. J Am Med Inform Assoc 14: 756–764 doi:10.1197/jamia.M2359.
[32]  Denny JC, Arndt FV, Dupont WD, Neilson EG (2008) Increased hospital mortality in patients with bedside hippus. The American journal of medicine 121: 239–245. doi: 10.1016/j.amjmed.2007.09.014
[33]  Turchin A, Kolatkar NS, Grant RW, Makhni EC, Pendergrass ML, et al. (2006) Using Regular Expressions to Abstract Blood Pressure and Treatment Intensification Information from the Text of Physician Notes. Journal of the American Medical Informatics Association 13: 691–695 doi:10.1197/jamia.M2078.
[34]  Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ (1994) Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1: 142–160. doi: 10.1136/jamia.1994.95236145
[35]  Haug PJ, Ranum DL, Frederick PR (1990) Computerized extraction of coded findings from free-text radiologic reports. Work in progress. Radiology 174: 543–548.
[36]  Friedman C, Hripcsak G, Shablinsky I (1998) An evaluation of natural language processing methodologies. Proceedings/AMIA. Annual Symposium 855–859.
[37]  Denny JC, Smithers JD, Miller RA, Spickard A (2003) “Understanding” medical school curriculum content using KnowledgeMap. J Am Med Inform Assoc 10: 351–362. doi: 10.1197/jamia.m1176
[38]  Dunham GS, Pacak MG, Pratt AW (1978) Automatic indexing of pathology data. Journal of the American Society for Information Science 29: 81–90. doi: 10.1002/asi.4630290207
[39]  Denny JC, Spickard A, Miller RA, Schildcrout J, Darbar D, et al. (2005) Identifying UMLS concepts from ECG Impressions using KnowledgeMap. AMIA. Annual Symposium proceedings [electronic resource]/AMIA Symposium 196–200.
[40]  Wang X, Hripcsak G, Markatou M, Friedman C (2009) Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc 16: 328–337. doi: 10.1197/jamia.m3028
[41]  Meystre SM, Haug PJ (2008) Randomized controlled trial of an automated problem list with improved sensitivity. International journal of medical informatics Available: http://www.ncbi.nlm.nih.gov/entrez/query?.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati?on&list_uids=18280787.
[42]  Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, et al. (2010) MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc 17: 19–24 doi:10.1197/jamia.M3378.
[43]  Melton GB, Hripcsak G (2005) Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 12: 448–457. doi: 10.1197/jamia.m1794
[44]  Denny JC, Spickard A, Johnson KB, Peterson NB, Peterson JF, et al. (2009) Evaluation of a method to identify and categorize section headers in clinical documents. J Am Med Inform Assoc 16: 806–815 doi:10.1197/jamia.M3037.
[45]  Friedman C, Shagina L, Lussier Y, Hripcsak G (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11: 392–402.
[46]  Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, et al. (2006) Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC medical informatics and decision making 6: 30.
[47]  Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics 34: 301–310. doi: 10.1006/jbin.2001.1029
[48]  Friedman C, Shagina L, Lussier Y, Hripcsak G (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11: 392–402.
[49]  Denny JC, Miller RA, Waitman LR, Arrieta MA, Peterson JF (2009) Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor. International journal of medical informatics 78 Suppl 1: S34–42. doi: 10.1016/j.ijmedinf.2008.09.001
[50]  Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, et al. (2010) Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17: 507–513 doi:10.1136/jamia.2009.001560.
[51]  Aronson AR, Lang F-M (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17: 229–236 doi:10.1136/jamia.2009.002733.
[52]  Sirohi E, Peissig P (2005) Study of effect of drug lexicons on medication extraction from electronic medical records. Pac Symp Biocomput 308–318. doi: 10.1142/9789812702456_0029
[53]  Wilke RA, Berg RL, Linneman JG, Zhao C, McCarty CA, et al. (2008) Characterization of low-density lipoprotein cholesterol-lowering efficacy for atorvastatin in a population-based DNA biorepository. Basic Clin Pharmacol Toxicol 103: 354–359. doi:10.1111/j.1742-7843.2008.00291.x.
[54]  Uzuner ?, Solti I, Cadag E (2010) Extracting medication information from clinical text. Journal of the American Medical Informatics Association 17: 514–518 doi:10.1136/jamia.2010.003947.
[55]  McCarty CA, Nair A, Austin DM, Giampietro PF (2007) Informed consent and subject motivation to participate in a large, population-based genomics study: the Marshfield Clinic Personalized Medicine Research Project. Community Genet 10: 2–9 doi:10.1159/000096274.
[56]  NUgene Project (n.d.). Available: https://www.nugene.org/. Accessed 16 September 2012.
[57]  Kaiser Permanente, UCSF Scientists Complete NIH-Funded Genomics Project Involving 100,000 People (n.d.). Available: http://www.dor.kaiser.org/external/news/?press_releases/Kaiser_Permanente,_UCSF_S?cientists_Complete_NIH-Funded_Genomics_P?roject_Involving_100,000_People/. Accessed 13 September 2011.
[58]  Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, et al. (2008) Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clinical pharmacology and therapeutics 84: 362–369.
[59]  Gupta D, Saul M, Gilbertson J (2004) Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. American journal of clinical pathology 121: 176–186. doi: 10.1309/e6k33gbpe5c27fyu
[60]  Aberdeen J, Bayer S, Yeniterzi R, Wellner B, Clark C, et al. (2010) The MITRE Identification Scrubber Toolkit: design, training, and assessment. Int J Med Inform 79: 849–859 doi:10.1016/j.ijmedinf.2010.09.007.
[61]  Uzuner O, Luo Y, Szolovits P (2007) Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc 14: 550–563 doi:10.1197/jamia.M2444.
[62]  Cardon LR, Palmer LJ (2003) Population stratification and spurious allelic association. Lancet 361: 598–604 doi:10.1016/S0140-6736(03)12520-2.
[63]  Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909 doi:10.1038/ng1847.
[64]  Dumitrescu L, Ritchie MD, Brown-Gentry K, Pulley JM, Basford M, et al. (2010) Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records. Genet Med 12: 648–650 doi:10.1097/GIM.0b013e3181efe2df.
[65]  Sohn M-W, Zhang H, Arnold N, Stroupe K, Taylor BC, et al. (2006) Transition to the new race/ethnicity data collection standards in the Department of Veterans Affairs. Popul Health Metr 4: 7 doi:10.1186/1478-7954-4-7.
[66]  Savova GK, Fan J, Ye Z, Murphy SP, Zheng J, et al. (2010) Discovering peripheral arterial disease cases from radiology notes using natural language processing. AMIA Annu Symp Proc 2010: 722–726.
[67]  Tatonetti NP, Denny JC, Murphy SN, Fernald GH, Krishnan G, et al. (2011) Detecting Drug Interactions From Adverse-Event Reports: Interaction Between Paroxetine and Pravastatin Increases Blood Glucose Levels. Clin Pharmacol Ther Available: http://www.ncbi.nlm.nih.gov/pubmed/21613?990. Accessed 7 June 2011.
[68]  Rzhetsky A, Wajngurt D, Park N, Zheng T (2007) Probing genetic overlap among complex human phenotypes. Proc Natl Acad Sci USA 104: 11694–11699 doi:10.1073/pnas.0704820104.
[69]  Chen DP, Weber SC, Constantinou PS, Ferris TA, Lowe HJ, et al. (2008) Novel integration of hospital electronic medical records and gene expression measurements to identify genetic markers of maturation. Pac Symp Biocomput 243–254. doi: 10.1142/9789812776136_0025
[70]  Wood GC, Still CD, Chu X, Susek M, Erdman R, et al. (2008) Association of chromosome 9p21 SNPs with cardiovascular phenotypes in morbid obesity using electronic health record data. Genomic Med 2: 33–43 doi:10.1007/s11568-008-9023-z.
[71]  Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E, et al. (2011) Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet 88: 57–69 doi:10.1016/j.ajhg.2010.12.007.
[72]  Denny JC, Ritchie MD, Crawford DC, Schildcrout JS, Ramirez AH, et al. (2010) Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science. Circulation 122: 2016–2021 doi:10.1161/CIRCULATIONAHA.110.948828.
[73]  Crosslin DR, McDavid A, Weston N, Nelson SC, Zheng X, et al. (2012) Genetic variants associated with the white blood cell count in 13,923 subjects in the eMERGE Network. Hum Genet 131: 639–652 doi:10.1007/s00439-011-1103-9.
[74]  Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, et al. (2011) Variants Near FOXE1 Are Associated with Hypothyroidism and Other Thyroid Conditions: Using Electronic Medical Records for Genome- and Phenome-wide Studies. Am J Hum Genet 89: 529–542 doi:10.1016/j.ajhg.2011.09.008.
[75]  Kullo IJ, Ding K, Shameer K, McCarty CA, Jarvik GP, et al. (2011) Complement receptor 1 gene variants are associated with erythrocyte sedimentation rate. Am J Hum Genet 89: 131–138 doi:10.1016/j.ajhg.2011.05.019.
[76]  Kho AN, Hayes MG, Rasmussen-Torvik L, Pacheco JA, Thompson WK, et al. (2012) Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J Am Med Inform Assoc 19: 212–218 doi:10.1136/amiajnl-2011-000439.
[77]  Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, et al. (2012) Portability of an algorithm to identify rheumatoid arthritis in electronic health records. Journal of the American Medical Informatics Association: JAMIA 19: e162–e169 doi:10.1136/amiajnl-2011-000583.
[78]  Denny JC, Kho A, Chute CG, Carrell D, Rasmussen L, et al.. (2010) Use of Electronic Medical Records for Genomic Research – Preliminary Results and Lessons from the eMERGE Network.
[79]  Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, et al. (2010) PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26: 1205–1210 doi:10.1093/bioinformatics/btq126.
[80]  Denny JC, Bastarache L, Crawford DC, Ritchie MD, Basford MA, et al. (2010) Scanning the EMR Phenome for Gene-Disease Associations using Natural Language Processing. Proc AMIA Annu Fall Symp
[81]  Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, et al. (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316: 1341–1345. doi: 10.1126/science.1142382
[82]  Collins F (2009) Opportunities and challenges for the NIH–an interview with Francis Collins. Interview by Robert Steinbrook. N Engl J Med 361: 1321–1323 doi:10.1056/NEJMp0905046.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133