Project Summary Our long-term goal is to optimize the design and conduct of human clinical research using informatics1. Eligibility criteria define the study population for every human study. Their clarity, accuracy and precision are crucial to the success of participant recruitment, results dissemination, and evidence synthesis. Our goal for this renewal is to build a data-driven and knowledge-based decision aid for real-life clinical researchers to optimize research eligibility criteria definition. The difference in the semantic representation of an eligibility criterion (e.g., having Type 2 diabetes mellitus) and its operationalization as a clinical variable (e.g., HbA1C ? 6.5% or ICD-9 code = ?250.00?) has been defined as the semantic gap2, the closing of which is a grand challenge for biomedical informatics2,3. Our research has contributed to the in-depth understanding of this semantic gap and how it limits computational reuse and effective communication of eligibility criteria to key stakeholders of clinical research4-9. We have developed informatics methods to help bridge this gap, by transforming free-text eligibility criteria into semi-structured formats to aid in study cohort identification10-13, analysis of the population representativeness of related clinical trials14-19, text mining of common eligibility features and their trends18,20-24, and identification of questionable exclusion criteria for mental disorder trials25. We used several of these methods to develop a visualization system called VITTA17 that shows how eligibility criteria and the clinical features of clinical trial populations vary across related trials. More importantly, our research has revealed an understudied root cause of the semantic gap, which is that eligibility criteria are often poorly defined, inaccurate, nonspecific, or imprecise, and not easily translatable to the real-world electronic health record (EHR) data representations to which the criteria must be operationalized. The advent of Big Patient Data offers an unprecedented opportunity to draw on the characteristics of real-world patients to guide and inform the data-driven precise definition of eligibility criteria25. By defining the characteristics of the intended study population, eligibility criteria critically influence the population representativeness of a clinical study, which further influences the tradeoff between patient safety and research results? replicability and generalizability. We hypothesize that by integrating patient data, including clinical and genomic data, with public clinical trial information, we can proactively guide investigators to optimize the precision, recruitment feasibility and representativeness of eligibility criteria. This research will demonstrate a novel data-driven and knowledge-based system to assist researchers with optimizing eligibility criteria, through innovative informatics methods for integrating proprietary and public data for deep phenotyping, target population profiling, and quantification and visualization of population representativeness.