A. Extraction of information from drug labels using natural language processing Drug package inserts (drug labels) are a comprehensive, up-to-date and authoritative source of drug information that is publicly available. To unleash the knowledge in the drug labels, they need to be transformed into standardized data structure and encoded in standard terminologies. Only then can the knowledge be used to drive applications such as clinical decision support. This collaborative project with the FDA uses natural language processing (NLP) and machine learning to extract information from the drug labels and create mappings to standard terminologies. The output will support the FDAs drug label indexing initiative to increase the usefulness of drug labels. Following on the success of the Text Analysis Conference (TAC) 2018, I am hosting the challenge for a second year in collaboration with FDA. The submissions in the 2018 challenge included promising methodologies using deep learning. My research team is exploring the use of neural networks to improve our existing processing pipeline. B. Use of medical terminologies to support clinical research. Social media is becoming an important source of patient reported data for data mining and data analytics research. I have used medical terminologies for the analysis of data posted by patients to study the adverse drug events and effectiveness of antidepressants. Medical terminologies and common data elements are important tools to allow sharing of clinical research data. I have studied their application in data sharing involving HIV-infected patients. C. Creating maps between commonly used terminologies Mapping provides a solution to the problem caused by the use of multiple coding systems for the same kind of information. One example is the use of SNOMED CT and ICD-10-CM for coding medical diagnosis and problems. Using various computational methods supplemented by expert review, I have developed maps between SNOMED CT and the different flavors and versions of ICD codes. This will help to facilitate data re-use and data integration. I have also studied the potential benefits of using maps in data encoding. I am also studying various algorithmic approaches to create mappings between SNOMED CT and ICD-10-PCS, including lexical matching, ontological alignment and indirect mapping. This has led to the creation of the publicly available MAGPIE tool (Map-Assisted Generation of Procedure and Intervention Encoding) launched in May 2019. D. Facilitating adoption of terminology standards According to the Meaningful Use and subsequent Improving Interoperability incentive programs, SNOMED CT and RxNorm are terminologies required for the certification of electronic health record systems. I have studied the practical barriers of adoption of these terminologies and created useful resources to help with implementation. I studied the usage pattern of SNOMED CT terms in the problem lists of large health care providers and published a list of the most commonly used terms as the CORE Problem List Subset of SNOMED CT. The CORE subset is not only a useful resource for SNOMED CT implementers, it is also frequently used for terminology research and other purposes, and cited in multiple publications. RxTerms is another resource that I have developed to overcome data entry problems with RxNorm.