Project Summary/Abstract: Although long-term outcomes such as mortality or the development of dementia/Alzheimer?s are often the main interest in studies of the effect of an intervention or treatment with the behavioral and social sciences, intermediate variables, such as biomarkers (e.g., cholesterol levels, blood pressure) or access to health care are frequently studied instead. The short-term effects of the treatment on these intermediate variables are frequently extrapolated naively to long-term outcomes, which may lead to erroneous conclusions regarding the long-term effect. We aim to demonstrate that data fusion methods may be used to more rigorously predict impacts on long-term outcomes. Specifically, we propose to combine information from two datasets (the first, an intervention dataset, is used to estimate the effect of the treatment on intermediate variables and the second, an outcome dataset, is used to estimate the effect of the intermediate variables on long-term outcomes) in a statistically valid manner. We begin by establishing assumptions needed for data fusion approaches to be feasible in the proposed context, and we then propose to develop methods based on distributional theory, data imputation and matching techniques. We will validate the efficacy of the proposed methods using simulated data and using a cross-validation method that involves real data. Our proposal is motivated by efforts to estimate the effect of having health insurance on long-term outcomes such as cognitive status, mortality and wealth accruement using short-term treatment data. These estimations will be made by fusing data from the Oregon Health Insurance Experiment with panel data from sources like the Health and Retirement Study and the Panel Study on Income Dynamics.