It is estimated that almost one-half of Americans suffer from chronic diseases, yet epidemiologic investigations are limited by the difficulty of ascertaining disease status at scale, even in the era of electronic medical records (EMRs). For example, algorithms based on structured data (e.g., ICD-9 codes) for asthma lack the sensitivity required for population-based studies, while manual medical record reviews of EMRs are labor-intensive and thus inefficient for population-scale ascertainment of disease status. The lack of efficient ways to ascertain disease status has severely restricted the scope of investigation for chronic diseases such as asthma. Furthermore, there is a temporal progression of a patient's true disease status, and this may not be reflected in the clinical diagnosis of that disease. We previously reported that two-thirds of children with asthma had a delay in their diagnosis (median: 3.3 years), with subsequent conditions like remission or relapse largely unreported. Such information about disease progression may be recorded during manual medical record review, but, again, manual review limits investigations and conclusions to small-scale studies. Our long term goal is to accelerate epidemiological investigations of chronic diseases and their temporal progression by streamlining medical record review. The main goal of this proposal is to extend a preliminary NLP-based system for asthma status ascertainment by identifying time-situated classifications of asthma onset, remission, and relapse. We will validate this system in a population health setting and release it as an open-source tool. We hypothesize that NLP methods in the EMR allow us to ascertain asthma status and to track asthma disease progression with greater accuracy and efficiency than conventional approaches (billing codes or manual medical record review). In Aim 1, we will extend our preliminary NLP system to ascertain the patient-level disease progression of asthma. Most significantly, we will ascertain time-situated asthma remission and relapse, two important events in the natural history of asthma. We will also improve methods of aggregating events, employ temporal expression and relation extraction, include structured data sources, and implement automatic feature selection. In Aim 2, we will evaluate the NLP system for its accuracy in ascertaining asthma onset, relapse, and remission. We will also verify the epidemiological (construct) validity against existing studies, and disseminate the NLP system as an open-source project, Adept (Aggregation of Disease Evidence for Patient Timelines). Expected Outcomes: The proposed NLP system will: (i) orient clinical NLP techniques toward time-situated patient-level solutions; (ii) expand the scale of research capabilities for asthma; and (iii) provide a basis for decision support and other applications. Successful completion of this project would provide an open-source tool for ascertaining the disease progression of asthma with a general approach to aggregating evidence.