PROJECT SUMMARY - TR&D1: DATA SCIENCE Large-scale data aggregation has generated considerable interest within the neuroscience community, both for its potential to increase the statistical significance of research results as well as for the reuse of data that has already been collected. New federated approaches are needed to bring together research studies that operate independently from one another and to manage the complex needs of data access, aggregation, harmonization, and analysis. In Aim 1 we build upon our extensive experience in developing federated database systems and propose a one-time application process that simplifies data access by consolidating disparate applications across multiple research institutions. We also propose a single secure and unified pathway for downloading binary and tabular files from different research studies that would significantly reduce the effort required to retrieve these files. In Aim 2 we introduce a new approach for harmonizing data collected by different research studies that incrementally applies transformations and provides immediate visual feedback with tabular updates and interactive summaries. In Aim 3 we propose to integrate recent Docker technologies into our framework and establish an archive for analyses that can be transferred to and executed on any Linux computer. Input and output data will be linked to their respective analyses and used as query criteria when searching the archive. In Aim 4 we propose a new mediator that acts as a bridge that connects all the components of our framework. This Analysis Assembler utilizes the unified pathway of Aim 1 and automatically downloads all files needed for an analysis. After retrieving the analysis itself from the archive in Aim 3, the Assembler proceeds to execute the analysis on the data files. After the analysis has completed, the Assembler records the provenance of all output data, which will be made accessible in visual queries of our federated search system. In Aim 5 we propose to extend our quality control system to use machine learning to automatically assign ?poor? and ?good? quality ratings to neuroimaging MRI data. With the goal of locating hard-to-see artifacts, we also propose to implement interactive 3D visualizations to more accurately assess image quality. All five of our aims provide a framework upon which neuroscience can be conducted, shared, and replicated ? comprising a foundation for reproducible science.