SUMMARY The major theme of this proposal is a tightly closed loop of experiment, theory, and data analysis. Sophisticated, scalable data science methods are a critical component of this loop. The Data Science Core serves two primary purposes. First, we will apply and refine sophisticated data analysis algorithms directly related to the project?s scientific goals. This project will generate massive streams of data from multiple recording and simulation modalities: whole-cell electrophysiology and anatomy, large-scale calcium imaging, spatiotemporally-complex optogenetic perturbations, RNA sequencing images, in addition to massive simulations of networks of spiking neurons. A correspondingly major effort is needed to manage this data, to distill it into new scientific knowledge, and to design new experiments, theoretical analyses, and simulations to close the theory-experiment-analysis loop. This will entail the application and iterative refinement of algorithms for preprocessing the data (e.g., taking calcium imaging video and extracting demixed and denoised neural activity from each cell visible in the field of view); aligning, registering, and performing statistical inferences on data across multiple modalities (e.g, calcium imaging, optogenetic stimulation, and seqFISH); functionally characterizing the stimulus preferences and correlation structure of the activity in the observed cells; and developing closed-loop optimal experimental design methods to obtain richer, more informative data. Second, this Core will build a collaborative infrastructure allowing the multiple laboratories in this project to act as one: sharing data and analysis tools, and closely integrating theorists and experimentalists. This infrastructure will: be completely open source; build on current efforts to standardize neuroscience data; be modular and extensible to allow for rapid iterative improvement of each stage of the algorithmic pipeline; enforce automatic archiving and recording of algorithmic metadata describing versioning and parameter choices for easy searchability and reproducibility; and allow for straightforward benchmarking. As we develop these practices and tools for data and analysis pipeline sharing, we will make them immediately available to the community. Thus we will provide a model platform for vastly improving reproducibility, keeping analysis pipelines up to date as improved methods are developed, and most importantly saving researchers from re- developing and re-implementing analysis software and data storage/sharing solutions. We aim to make it easy for groups of labs anywhere in the world to unite and crack large-scale neural circuits. This will transform the way neuroscience is done.