Project Summary As United States healthcare seeks to address inconsistent quality and overwhelming cost, data and technology have become central to all suggested approaches. With newly available electronic health data and massive growth in processing power, the hardest challenges in using clinical data are becoming clear. Big data holds the potential to enable personalized patient care, population health management, and value-based payment models. However, it also creates challenges in discriminating accurate data from inaccurate or incomplete information. One of the greatest areas of data inaccuracy is the patient phenotype, or clinical description of the patient. Every clinical decision support tool, population health management system, and payment reform product relies on accurate electronic patient descriptions as its source data. But, the descriptions are not accurate, most notably in terms of completeness and granularity. Recall often falls below 50% in describing a patient?s medical conditions, such as heart failure and cancer. Detailed descriptions such as low ejection fraction heart failure or stage III breast cancer, needed for downstream analytics, are lacking in the discrete record. Poor data puts care delivery, payment reform, and population health efforts in peril. The time is right for technology to proactively define the clinical phenotype from source data, without reliance on current manual approaches. This will necessitate overcoming challenges in harmonizing discrepant narrative and discrete data, inferring when a characteristic such as cough is a primary condition versus symptom of another condition, and screening noise from signal in robust narrative text. This Small Business Innovation Research (SBIR) Phase I project will include the following specific aims: 1. Create the components required to define an accurate and comprehensive clinical phenotype, including: (i) extract problem, medication, procedure, and lab features from clinical data using natural language processing (NLP) and ontologic mapping, (ii) build a large knowledge database of associated clinical conditions, and (iii) assess extracted features against the knowledge database to accurately distinguish symptoms from diseases and surface relevant active diseases in a candidate problem list. 2. Validate the clinical phenotyping components using de-identified longitudinal clinical data for 10,000 patients The goal, dependent on Phase I success, is to create an automated, accurate, and robust clinical phenotyping engine to enable personalized patient care, population health management, and value- based payment models.