Because of the profound effect of adverse drug events (ADEs) on patient safety, the FDA, AHRQ and Institute of Medicine have flagged post-marketing pharmacovigilance of emerging medications as a high national research priority. The FDA, Foundation for the NIH and PhARMA formed the Observational Medical Outcomes Partnership (OMOP) to develop and compare methods for identification of ADEs, and the FDA announced its Sentinel Initiative. Congress created the Reagan Udall Foundation (RUF) for the FDA in response to the FDA's own FDA Science and Mission at Risk report, and two years ago OMOP activities were incorporated into RUF. As the FDA moves forward with its development of Sentinel, including work on Mini-Sentinel, there is a need for researchers around the country to continue to develop better methods, and better evaluation methodologies for those methods. A robust research community working on algorithms for pharmacosurveillance, using electronic health records (EHRs) and claims databases will provide a substrate of ever-improving methods on which the nation's regulatory pharmacovigilance infrastructure can build. Indeed an important motivation of OMOP and Mini-Sentinel was to spur the development of such a community. Machine learning has attracted widespread attention across a range of disciplines for its ability to construct accurate predictive models. Therefore machine learning is especially appropriate for the problems of ADE identification and prediction: identifying ADEs from observational data, and predicting which patients are most at risk of suffering the identified ADE. Our current award has demonstrated the ability of machine learning to address both of these tasks. It has added to the existing evidence that consideration of temporal ordering of events, such as drug exposure and diagnoses, is critical for accuracy in identification and prediction of ADEs. The proposed work seeks to further improve upon these methods by building on recent advances in the field of machine learning, by our group and by others, in graphical model learning and in explicit modeling of irregularly-sampled temporal data. The latter is especially important because observational health databases, such as EHRs and claims databases, are not simple time series. Patients typically do not come into the clinic at regular intervals and have the same labs, vitals, and other measurements in lock step with one another. Building better ADE detection and prediction algorithms cannot be accomplished simply by machine learning research, even if that research is taking account of related work from relevant parts of computer science, statistics, biostatistics, epidemiology, pharmaco-epidemiology, and clinical research. Better methods are needed also for evaluation, that is, for estimating how well a new algorithm, or a new use of an existing algorithm, will perform at identifying ADEs associated with a new drug on the market, or at predicting which patients are most at risk of that ADE. More research and evaluation is also needed at the systems level: how can we best construct end-to-end pharmacovigilance systems that sit atop a large observational database and flag potential ADEs for human experts to further investigate? What kinds of information and statistics should such a system provide to the human experts? This renewal will address the following aims: (1) improve upon machine learning methods for identification and prediction of ADEs, taking advantage of synergies between these two distinct tasks; (2) improve upon existing methods for evaluating ADE detection, building on advances in machine learning for information extraction from scientific literature; (3) improve upon existing methods for evaluating ADE prediction, building upon advances in machine learning for automated support of phenotyping and also building upon improved methods for efficiently obtaining expert labeling of borderline examples of a phenotype; and (4) use the methods developed in the first three aims to construct and evaluate an end-to-end pharmacosurveillance system integrated with the Marshfield Clinic EHR Data Warehouse. Machine learning plays a central and unifying role throughout all four aims. Our investigator team consists of machine learning researchers with experience in analysis of clinical, genomic, and natural language data (Page, Natarajan), a leading pharmaco-epidemiologist with expertise in building systems to efficiently obtain expert evaluation and labeling of phenotypes (Hansen), a leader in phenotyping from EHR data (Peissig), and an MD/PhD practicing physician with years of experience and leadership in the study of ADEs (Caldwell). In addition to building on results of the prior award, we will build on our experiences with OMOP, the International Warfarin Pharmacogenetics Consortium, the DARPA Machine Reading Program, and interactions with the FDA.