Asthma is the most common chronic condition in children and one of the five most burdensome disease in the United States. Despite this, epidemiologic investigations into childhood asthma are limited by variations in asthma diagnosis across sites and inefficient utilization of electronic medical records (EMRs) to facilitate large- scale studies. Algorithms based on structured data (e.g., ICD-9 codes) have shown strong specificity, but lack the sensitivity required for population-based studies for asthma. Manual EMR reviews allow application of well- recognized criteria-based definitions such as the Asthma Predictive Index (API) or the Predetermined Asthma Criteria (PAC), but are labor-intensive and expensive, and therefore not feasible for population-level studies. Because of the lack of consistent, reproducible, and efficient asthma ascertainment methods, the use of inconsistent a. asthma criteria, b. ascertainment processes, and c. sampling frames results in inconsistent asthma cohorts and study results for clinical trials or other studies. This inconsistency causes confusion, delayed translation of important study findings into clinical practice, and may obscure the true heterogeneity of asthma. Our long-term goal is to advance research and clinical care for asthma, by developing a robust software tool to streamline the process of automatic medical record ascertainment of asthma based on the asthma criteria (PAC and API). We propose to augment traditional structured data criteria with natural language processing (NLP) techniques to account for unstructured text. Thus, the main goal of this proposal is to develop NLP-API, an NLP algorithm for automating API, and apply the NLP algorithms for both PAC and API to identify a cohort of children with asthma. In addition, we will use the tools to characterize children with asthma thereby demonstrating its usefulness in epidemiological investigations and also possibly in asthma management. We hypothesize that asthma criteria-based NLP algorithms applied to the EMR will allow us to identify and characterize asthma status accurately, consistently, and efficiently. In Aim 1, we will develop NLP-API, an NLP algorithm for API. In Aim 2, we will apply both NLP-API (developed under Aim 1) and NLP-PAC (our recently developed PAC-based NLP algorithm) to two evaluation cohorts. In Aim 3, we will characterize the subgroups of children with asthma identified under Aim 2 by assessing the association of NLP-ascertained asthma status with lung function and biomarkers for asthma. The expected outcomes of the proposed study are: (i) enhanced research capabilities for asthma by enabling more consistent, reproducible, and efficient large-scale asthma ascertainment, sampling frames, and timing estimations; (ii) a basis for improving timely asthma diagnosis and care through clinical decision support systems; and (iii) advancement of the use of NLP techniques for clinical studies. Successful completion of this project will provide an accurate, consistent, and efficient tool for addressing the significant burden of asthma in children and a framework for extension to other chronic diseases and adults.