%0 Journal Article %T SMASH: A Data-driven Informatics Method to Assist Experts in Characterizing Semantic Heterogeneity among Data Elements %A Alex Carballo-Di¨¦guez %A Chunhua Weng %A David K. Vawdrey %A Suzanne Bakken %A William Brown %A III %J Archive of "AMIA Annual Symposium Proceedings". %D 2016 %X Semantic heterogeneity (SH) is detrimental to data interoperability and integration in healthcare. Assessing SH is difficult, yet fundamental to addressing the problem. Using expert-based and data-driven methods we assessed SH among HIV-associated data elements (DEs). Using Clinicaltrials.gov, we identified and obtained eight data dictionaries, and created a DE inventory. We vectorized DEs by study, and developed a new method, String Metric-assisted Assessment of Semantic Heterogeneity (SMASH), to find DEs: similar in An and Bn, unique to An, and unique to Bn. An HIV expert assessed pairs for semantic equivalence. Heterogeneous DEs were either semantically-equivalent/syntactically-different (HIV-positive/HIV+/Seropositive), or syntactically-equivalent/semantically-different (ˇ°Partnerˇ± [sexual]/ˇ°Partnerˇ±[relationship]). Context of usage was considered. SMASH aided identification of SH. Of 1,175 DE from pairs, 1,048 (87%) were semantically heterogeneous and 127 (13%) were homogeneous. Most heterogeneous pairs (97%) were semantically-equivalent/syntactically-different. Expert-based and data-driven methods are complementary for assessing SH, especially among semantically-equivalent/syntactically-different DE. Similar expert-based/data-driven solutions are recommended for resolving SH %U https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333258/