This is the second annual report for the Data Science and Sharing Team (DSST), our first year with our full staff in place. 2018 also marked the arrival of Francisco Pereira, the director of our sister group, the Machine Learning Team. Follows is a list of our noteworthy projects. Intramural Data Sharing Over the course of 2018, our team has worked with the authors of several different projects to make their data publicly available. They include: Gonzalez-Castillo et al., 2012, https://doi.org/10.15154/1464517 Gonzalez-Castillo et al., 2015, https://doi.org/10.15154/1464520 Shaw et al., 2018, https://doi.org/10.15154/1463004 Thurm et al., 2018, https://doi.org/10.15154/1464602 Power et al., 2018, https://openneuro.org/datasets/ds000258/ These datasets will be or are available through both the NIMH Data Archive (NDA) as well as our internal repository at http://nido.nimh.nih.gov which is derived from Stanford Center for Reproducible Neuroscience's (CRN) OpenNeuro project. In 2018 we were granted supplementary funding to expand NIDO's capabilities to execute analyses directly on the NIH HPC. We are currently working closely with the CRN to implement this functionality for both NIDO and OpenNeuro. Acquisition and Maintenance of Shared Dataset on the NIH HPC Our team also makes it easy for intramural researchers to access and use data from other institutions by streamlining the authorization process. We download and organize each dataset on shared disk accessible from the NIH High Performance Computing (HPC) cluster for easy and efficient analysis. Datasets that we have download and are currently maintaining include the Human Connectome Project, the OpenNeuro Archive, and the UK Biobank Imaging Extension. Exploring Cross Scanner Harmonization We were approached by several member of the NIMH extramural staff to consult with them on how effective current inter-scanner homogenization techniques are. In order to test these techniques, we used the initial dataset recently released by the Adolescent Brain Cognitive Development (ABCD) project (https://abcdstudy.org/). This work is now a manuscript which is available as a preprint (https://doi.org/10.1101/309260) and is currently under peer review. Aggregating Worldwide MRI Scanner Quality Metrics MRIQC is a package for assessing the quality of structural and functional MRI data. We are supporting a centralized database for aggregating anonymized quality control statistics on over 100,000 scans collected around the world. This database allows investigators to compare the data they have collected with a vast trove of similar data, to assess the quality of their scans and help guide changes in collection protocols and decide which scans need to be manually reviewed. Research Volunteers Protocol Our team continues to collaborate with Joyce Chung and NIMH Clinical Director's Office on the Research Volunteers Protocol that was approved in October of 2017. Over 45 participants have been scanned to date using a standardized MRI protocol. The data from these participants will soon be publicly available on the NIMH Data Archive. These participants are also being actively referred to other NIMH protocols for which they qualify. For more details, see the annual report for the Clinical Director's Office. Machine Learning from Distributed Datasets Helping to make data widely available is part of our core mission, but sometimes privacy issues make it impossible for data to be shared. We are collaborating with the Machine Learning Team on a project to build a parallel weight consolidation tool. This technique will allow researchers to train a neural network and share anonymized data from that training with a central hub that aggregates across many sites to improve its performance and prediction. See the Machine Learning Team's annual report and this preprint (https://arxiv.org/abs/1805.10863, currently under review) for more details on this project. Exemplifying Open Science Our team has served as an example of open science practices. All of our papers have been released to the public as preprints on bioRxiv prior to submission, along with code and data. We have pre-registered two large data analysis efforts with the Open Science Foundation (http://doi.org/10.17605/OSF.IO/3YTQJ, http://doi.org/10.17605/OSF.IO/GCD7Z). The vast majority of the code for our own projects is available in open source repositories on github.com, where we have committed over 400 code changes. We have additionally contributed over 75 code changes to 18 open source projects across the community via github pull requests. Crowdsourcing Imaging Science Accurate volumetric measurements of the brain, derived from MR image segmentation, are fundamental for understanding structure-function relationships in the healthy brain, and in psychiatric and neurological disorders; however, obtaining these measurements in large datasets remains a challenge, especially in many patient populations. We sought to integrate the expertise of neuroanatomists, citizen science through crowd-sourcing, and recent advances in deep-learning algorithms for image-processing. We worked with Drs. Anisha Keshavan, Ariel Rokem, and Jason Yeatman from the University of Washington to create Medulina, an open source, web-browser based platform for biomedical image segmentation. Medulina is available online at medulina.com, and we have IRB approval from the Univ. of Washington to begin collecting citizen contributed medical image annotations. One of the available projects is to segment high resolution hippocampal images that will aid in the analysis of data recently collected and published in collaboration with the Child Psychiatry Branch (Zhou et al., 2018). Measuring and Incentivizing Open Science We are working with the non-profit group ImpactStory to create an online tool for tracking how many publications from the NIMH IRP are publicly available, have open code, and have open data. ImpactStory is uniquely suited to work on this project because they have created and maintain a database of which publications are and are not publicly accessible and they have created websites with similar functionality (impactstory.org). This tool will provide the foundation for other institutions to measure and incentivize open science. Accessible, reproducible pipelines in AFNI for standardized data The Data Science and Sharing Team works closely with the Scientific and Statistical Computing Core. The SSCC is responsible for AFNI -- the NIMH IRP's flagship neuroimaging analysis software which is used in research on many conditions, including autism, anxiety disorders, drug addiction, major depression, and epilepsy. We collaborated with the AFNI team on a BRAIN Initiative grant for funding to make it practicable to use AFNI on large data collections stored in the cloud, or on local supercomputing clusters, accelerating research into diverse brain-based conditions in sizable groups. Our grant brings AFNI's widely acknowledged analytical and visualization capabilities to cloud-based environments in order to better support BRAIN Initiative projects. Upon completing the proposed work, a user will be able to complete an entire analysis workflow including preprocessing, analysis, quality control, and visualization of results in the cloud.