Challenge Area and Specific Challenge Topic: This application addresses broad Challenge Area (10) Information Technology for Processing Health Care Data and specific Challenge Topic, 10-RR-101*: Information Technology Demonstration Projects Facilitating Secondary Use of Healthcare Data for Research. Clinical Data Warehouses (CDWs) archive data from electronic medical records (EMRs). Unlike EMRs, which are designed to store and retrieve data by patient (e.g., all data about John Smith), CDWs support queries across patients (e.g., percentage of patients on vs. off aspirin who develop unstable angina). CDWs are critical components of an infrastructure that enables reuse of healthcare data for research. As such, they are important enablers of comparative effectiveness research (CER). However, simply transferring healthcare data from EMRs to a CDW is not sufficient. Healthcare data, unlike clinical trial data, are not collected with a research question in mind. Thus, they may be poorly structured (e.g., free-text list of diagnoses, not a list of ICD9 terms) and contain protected health information (e.g., names, addresses) or identifying phrases such as "senator with lymphoma." Our unifying hypothesis is that concept-level approaches can be applied to CDWs to bring meaning to vast amounts of healthcare data while protecting subject privacy. To test this hypothesis, we will: 1) adapt and evaluate our novel indexing system (based on graph analysis, a modification of Google's PageRank algorithm) to improve concept extraction from clinical text, 2) evaluate the privacy afforded to "subjects" by working with clinical text at the concept level and 3) adapt and evaluate existing visualization techniques to visualize relationships among concept-level healthcare data, thereby facilitating exploratory data analysis by biomedical researchers. Although these aims build on each other, every individual aim can succeed even if the others fail. At the conclusion of this project we will have developed and evaluated novel concept-extraction algorithms for CER using healthcare data. We will have determined the privacy implications of working with concept-level data and developed interactive visualizations for concept-level browsing of large healthcare data sets. Many organizations are building clinical data warehouses to enable comparative effectiveness research. However, simply loading data from electronic medical records into clinical data warehouses is not enough. To enable reuse of healthcare data for research, we will develop new ways to access and visualize clinical data within data warehouses. Specifically, we will develop new ways to extract concepts from unstructured text, visualize large data sets to quickly see patterns and determine the privacy implications of our methods.