This application addresses broad Challenge Area of "Information Technology for Processing Health Care Data" (Area # 10) and two specific Challenge topics (10-RR-101): Information Technology Demonstration Projects Facilitating Secondary Use of Healthcare Data for Research and (10-HL-101): Develop data sharing and analytic approaches to obtain from large- scale observational data, especially those derived from electronic health records, reliable estimates of comparative treatment effects and outcomes of cardiovascular, lung, and blood diseases. Several challenge areas in this request suggest that there is no single data set for addressing the complex interplay of biological, social, psychological, economic and environmental determinants of health. Still, deeper understanding of these facets is needed to make informed policy decisions. The principal investigator of this proposal has been involved in developing methods for combining information from multiple data sources. These methods can be enhanced and become more useful if a proper computational infrastructure implementing them were made available to substantive researchers. The goal of this proposal is to develop a statistical infrastructure to facilitate pooling of data from multiple sources, create data sets for analytical purposes and provide software for properly analyzing such combined data sets. Under this approach the two or more data sets will be concatenated, common variables across the data sources will be aligned and the nonaligned variables will be treated as missing data in one or more data sets. The multiple imputation methodology will be tailored to combine information from multiple sources using fairly general and semi-parametric models. The end product will be two modes of software will be built upon the existing infrastructure developed by the principal investigator. In the first mode, the user will input two or more data sets, list all the variables to be analyzed from these data sets, specify the statistical model to test a research hypothesis and the software will then properly analyze the data and disseminate the results. When data cannot be freely distributed to researchers due to confidentiality concerns, the second mode will add a web-based remote data analysis tool for the computer system holding the confidential data sets. The legitimate users will then be able to log-in to the system through web and specify the analytical request as before. Two versions of the software will be developed one as an add-on to SAS, popular software used by many researchers. The second version will be a stand alone version which can be used by researchers who do not have access to SAS. The software will be developed to work under both Windows and Linux platforms. The purpose of this research is to build a statistical software system that allow biostatisticians, clinicians, epidemiologists and other public health researchers to fit statistical models that call for combining information from multiple data sources. The aim is pool information from administrative, epidemiological and clinical study databases to address public health research. The software system can be used by the Data Coordinating Centers where the analysis requests can be remotely submitted by network researchers. The system will also be useful for data repositories where data sets cannot be released due to confidentiality concerns but allow researchers to request analysis based on those data sets. The statistical software system will be distributed via web freely to the public.