This is the third annual report for the Data Science and Sharing Team (DSST). Both the DSST and our sister group, the Machine Learning Team have now been at full staff for a full year and this has resulted in a considerable acceleration in productivity and demands for our services. Below is a summary of our activities for the past fiscal year. Staff Transitions After two years of excellent work on our team, our two original staff members, John Lee and Dylan Nielson, have been offered and accepted higher level positions within other intramural research groups. Dr. Lee accepted a position on the AFNI team under Dr. Robert Cox in November of 2018, and Dr. Nielson became the lead staff scientist for the Mood Brain & Development Unit (MBDU) under Dr. Argyris Stringaris in August of 2019. We see these transitions as a critical part of our core missions. Many NIMH intramural research groups struggle to recruit quantitatively oriented candidates with strong programming skills as these candidates often choose to work in the for-profit technology sector which can offer higher salaries. Our team is fortunate in that we receive many inquiries from these candidates. We find that they are eager to work on our team given that we lie at the intersection of impactful health research and scientific progress while offering ample opportunities for training at the cutting edge of data science. When our staff members advance their skill sets to a degree that they can compete for supervisory level positions and still choose to remain within our institute, we see this as a significant step toward achieving the mission of our team and strengthening the NIMH overall. Intramural Data Sharing Helping intramural investigators share their research data remains one of our team's core missions. One particularly large and impactful dataset that we have recently made publicly available from Armin Raznahan's section: 1,500 structural MRI scans collected longitudinally on 792 participants between the ages of 5 and 25. These data were collected at the NIMH between 1990-2010 and have proved very difficult to make publicly available. We are very happy to help finally make these data available to the community on the NIMH Data Archive: http://dx.doi.org/10.15154/1504177 https://nda.nih.gov/edit_collection.html?id=3142 Our team also continues to collaborate with Joyce Chung and the NIMH Clinical Director's Office on the Healthy Research Volunteers Protocol that was approved in October of 2017 (NCT03304665). Over 90 participants have been scanned to date using a standardized MRI protocol that was designed in collaboration with Vinai Roopchansingh from the NIMH fMRI Core Facility. The data from these participants is now publicly available on the NIMH Data Archive: https://nda.nih.gov/edit_collection.html?id=2843 These participants are also being actively referred to other NIMH protocols for which they qualify which we hope will lead to a rich, longitudinal dataset of healthy control data that is publicly accessible. Exploring Cross Scanner Harmonization - ABCD Several members of the NIMH extramural staff approached our team to consult with them on the effectiveness of current techniques for homogenization of data collected on different MRI scanners. To test these techniques, we used a dataset recently released by the Adolescent Brain Cognitive Development (ABCD) project. This work is now a manuscript available as a preprint (https://doi.org/10.1101/309260) and was recently presented by Dylan Nielson at the Organization for Human Brain Mapping in Rome and is available in poster format: http://doi.org/10.5281/zenodo.3366310 Aggregating Worldwide MRI Scanner Quality Metrics - MRIQC MRIQC is a software tool for assessing the quality of structural and functional MRI data. We collaborated with the authors of MRIQC to build a database that receives scan quality information that is automatically uploaded every time MRIQC is run. That database, which we maintain, now contains anonymized quality control statistics on over 250,000 scans collected around the world. This database allows investigators to compare the data they have collected on their own scanners with a vast trove of similar data, to assess the quality of their scans and help guide changes in collection protocols and decide which scans need to be manually reviewed. These data show interesting relationships related to scan metadata such as manufacturer, flip angle, and tesla strength. These data were recently presented at the Human Brain Mapping Meeting in Rome and are available in poster format: http://doi.org/10.5281/zenodo.3366303 An open repository for positron emission tomography (PET) data in BIDS format In 2017 Robert Innis, a leader in the standardization of PET nomenclature, approached our group to collaborate on building the necessary tools and infrastructure to make PET data sharing easier and more common. In collaboration with Melanie Ganz, who authored the majority of the PET BIDS standard, Russ Poldrack of the Stanford Center for Reproducible Neuroscience, and Doug Greeve of MGH, we are working to create a PET repository based on the OpenNeuro platform. Quantifying Open Science and Data Sharing at NIMH Funding bodies and journals are increasingly encouraging or requiring data sharing, however it is unknown how successful these policies have been over time. Even when data are shared publicly, it is difficult to determine if the data are properly organized and contain sufficient information for independent scientists to successfully use it to ask novel questions. Using a text mining approach, we are working to estimate the proportion of publications funded by the NIMH (both extramural and intramural) that provide references or links to shared data or code. In the coming months, we will also measure the impact of data sharing by assessing the rates of secondary use. In collaboration with the non-profit group ImpactStory, the results of this work will be publicized with an interactive tool for tracking which NIMH funded publications are publicly available, have open code, and have open data. This work will be presented at the MetaScience 2019 conference in Stanford in September. We hope these tools will provide the foundation for NIMH and other institutions to measure and incentivize effective data and code sharing that leads to reuse.