Terminology projects A. Extraction of information from drug labels using natural language processing Drug package inserts (drug labels) are a comprehensive, up-to-date and authoritative source of drug information that is publicly available. To unleash the knowledge in the drug labels, they need to be transformed into standardized data structure and encoded in standard terminologies. Only then can the knowledge be used to drive applications such as clinical decision support. This collaborative project with the FDA uses natural language processing (NLP) and machine learning to extract information from the drug labels and create mappings to standard terminologies. The output will support the FDAs drug label indexing initiative to increase the usefulness of drug labels. In collaboration with FDA, I am hosting a challenge at the Text Analysis Conference (TAC) 2018 to find the best performing NLP algorithm for extracting drug-drug interactions from drug labels. B. Evaluation of drug-drug interaction knowledge resources and comparison with open standards A standard, evidence-based knowledge base is a prerequisite to the effective deployment of a clinical alert system to prevent harmful drug-drug interactions. However, studies have shown significant variability between knowledge sources. Inconsistent evaluation and classification of interactions have been cited as factors contributing to excessive alerts and alert fatigue. The goal of this study is to compare systematically commercial drug-drug interaction knowledge sources against published standards. Our results show that while there are considerable differences between three commercial knowledge bases, the differences are less pronounced for the more severe interactions. There was very high coverage by all three knowledge sources of a standard list of clinically highly significant interactions. C. Creating maps between commonly used terminologies Mapping provides a solution to the problem caused by the use of multiple coding systems for the same kind of information. One example is the use of SNOMED CT and ICD-10-CM for coding medical diagnosis and problems. Using various computational methods supplemented by expert review, I have developed maps between SNOMED CT and the different flavors and versions of ICD codes. This will help to facilitate data re-use and data integration. I am also studying various algorithmic approaches to create mappings between SNOMED CT and ICD-10-PCS, including lexical matching, ontological alignment and indirect mapping. D. Facilitating adoption of terminology standards According to the Meaningful Use incentive program, SNOMED CT and RxNorm are terminologies required for the certification of electronic health records. I have studied the practical barriers of adoption of these terminologies and created useful resources to help with implementation. I studied the usage pattern of SNOMED CT terms in the problem lists of large health care providers and published a list of the most commonly used terms as the CORE Problem List Subset of SNOMED CT. The CORE subset is not only a useful resource for SNOMED CT implementers, it is also frequently used for terminology research and other purposes, and cited in multiple publications. RxTerms is another resource that I developed to overcome data entry problems with RxNorm.