Our long-term objective is to enlarge the scope and efficiency of clinical research through enhanced use of clinical data to support clinical research decisions. This proposal aims to improve the use of electronic health records (EHR) to automate clinical trials eligibility screening by developing a new semantic alignment framework. Clinical trials research is an important step for translating breakthroughs in basic biomedical sciences into knowledge that will benefit clinical practice and human health. However, a significant obstacle is identifying eligible participants. Eighty-six percent of all clinical trials are delayed in patient recruitment for from one to six months and 13% are delayed by more than six months. Enrollment delay is expensive. In a recent large, multi-center trial, about 86.8 staff hours and more than $1000 was spent to enroll each participant. Ineffective enrollment also produces a big social cost in that up to 60% of patients can miss being identified. The broad deployment of EHR systems has created unprecedented opportunities to solve the problem because EHR systems contain a rich source of information about potential participants. However, it is often a knowledge-intensive, time-consuming, and inefficient manual procedure to match eligibility criteria such as "renal in- sufficiency" to clinical data such as "serum creatinine = 1.0 mg/dl for an 80-year old white female patient." This enduring challenge is partly caused by the disconnection between abstract and ambiguous eligibility criteria and highly specific clinical data manifestations;we call this a semantic gap. Despite earlier work on computer-based clinical guidelines and protocols, limited effort has been devoted to support automatic matching between concepts and their manifestations in patient phenotypes such as signs and symptoms. We hypothesize that we can characterize the semantic gap and design a knowledge-based, natural-language processing assisted semantic alignment framework to bridge the semantic gap. Therefore, our specific aims are: (1) to investigate the semantic gap between clinical trials eligibility criteria and clinical data;(2) to design a concept-based, computable knowledge representation for eligibility criteria;(3) to design a semantic alignment framework linking an eligibility criteria knowledge base and a clinical data warehouse to generate semantic queries for eligibility identification;and (4) to evaluate the utility of the semantic alignment framework. This research is novel and unique in that (1) there are no prior studies about the semantic gap between eligibility criteria and clinical data;and (2) for the first time, we design a semantic alignment framework to automatically match eligibility criteria to clinical data. The research team comprising expertise from the Department of Biomedical Informatics at Columbia University and the Division of General Medicine from UCSF are uniquely positioned to carry out this research, given the experience of the team (medical knowledge representation, natural language processing, controlled clinical terminology, ontology-based semantic reasoning, data mining, statistics, health data organization, semantic harmonization, and clinical trials), the availability of a repository of 13 years of data on 2 million patients, and the availability of a natural language processor called MedLEE to convert millions of narrative reports into richly coded clinical data.