Epidemiologic analyses of health care data can provide critical evidence on the effectiveness and safety of therapeutics. This is particularly vital during the transition from the point of regulatory approval through the early marketing of new drugs, a time when physicians, regulators and payers are all struggling with incomplete data. Health plans pay for these drugs without knowing how their effectiveness and safety compares with established alternatives, as new compounds are tested against placebos rather than active agents, and tested only in selected patients. Non-randomized studies in large healthcare databases can provide fast and less costly evidence on drug effects. However, conventional adjustment methods that rely on a small number of investigator-specified confounders often fail and may produce biased results. We propose and have preliminary evidence that employing modern medical informatics algorithms that structure and search databases to empirically identify thousands of new covariates. These will then enter established propensity score-based models and so make far more effective use of the information contained in health care databases and electronic medical records (EMRs), resulting in more valid causal interpretations of treatment effects. We will: - Develop algorithms that make greater use of information contained in longitudinal claims and EMR databases by empirically identifying thousands of potential confounders. The performance of these approaches will be evaluated in 6 example studies encompassing recent drug safety and comparative effectiveness problems, and will be implemented in multiple large claims databases supplemented by such data as lab values and EMR information in subgroups. -- Develop novel methods for confounding adjustment based on textual information found in EMRs. -- Expand the newly developed mining algorithms into a framework that integrates distributed database networks with uneven information content, similar to the Sentinel Network recently initiated by FDA. This project is likely to produce groundbreaking results at the interface of medicine, biomedical informatics, and epidemiologic methods. After completion of this project a library of documented and validated algorithms will be available to significantly improve confounder control in a range of healthcare databases. The theoretical foundation and the ready-to-use algorithms will likely lead to a fundamental shift in how databases contribute to the fast and accurate assessment of newly-marketed medications.