The goal of cancer pharmacoepidemiology is to identify adverse and/or long-term effects of chemotherapeutic agents and determine the impact of drugs on cancer risk, prevention, and response to treatments. Pharmacoepidemiology studies exert strong influence on defining optimal treatments and accelerating translational research. Therefore, it is imperative for these to be done efficiently and leveraging real-world patient data such as electronic health records (EHR). Massive clinical data from EHRs are being tapped into for research in disease-gene associations, comparative effectiveness and clinical outcomes. There is however paucity in pharmacoepidemiological studies using comprehensive EHR data due to the inherent challenges that exist for data abstraction, handling and analysis. The hurdles include heterogeneity of reports, embedding of detailed clinical information in narrative text, differing EHR platforms across different sites and missing data to name a few. In this study, we propose to integrate and extend preexisting tools to build an informatics infrastructure for EHR data extraction, interpretation, management and analysis to advance cancer pharmacoepidemiology research. We will leverage existing tools of natural language processing (NLP), standardized ontologies and clinical data management systems to extract and manipulate EHR data for cancer pharmacoepidemiological research. To achieve our goal we propose four specific aims. In aim 1, we intend to develop a high-performance, user- centric information extraction framework with advanced features such as active learning (to reduce annotation cost), domain adaptation (to transfer data across multiple sites) and user-friendly interfaces (for non-technical end users). In aim 2, we plan to improve data harmonization across differing platforms, develop components for seamless data export as well as expand methodologies to address impediments inherent to EHR-based data (such as the missing data problem). In aim 3, we will conduct demonstration projects of cancer pharmacoepidemiology including pharmacovigilance and pharmacogenomics of chemotherapeutic agents to evaluate, refine and validate the broad uses of our tools. Finally in aim 4, we propose to disseminate the methods and tools developed in this project to the cancer research and pharmacoepidemiology communities.