This is the first annual report for the Machine Learning Team. We have only been fully operational since January of 2018, when Francisco Pereira was hired. Charles Zheng and Patrick McClure were hired late in 2017, and worked within the Data Science and Sharing Team (DSST) until January. In 2018, the team hosted summer interns Nao Rho and Tim Barry. In our first few months, we concentrated on setting up the hardware and computational infrastructure for the team, and starting collaborations that were particularly well matched to our skills and interests. We also canvassed many individual researchers and trainees to understand their needs, as that is crucial for guiding our future activities. Over the coming months, we plan to release software packages we have been working on, after testing and documentation. Beyond that, we plan to prioritize the development of training material and workshops, in order to be able to have an impact on the work of more labs through their trainees and staff scientists. We will, in parallel, continue working on specific projects that lend themselves to the development of new methods, and joint high-impact efforts with the DSST. Development of novel methods One of the obstacles to the use of machine learning methods on very large datasets is the fact that many of these datasets may not be shared outside their site of origin, due to IRB constraints, sheer scale, or other factors. We developed a new method to allow the combination of prediction models -- Bayesian deep neural networks -- trained in different datasets from different sites into a single prediction model, with performance comparable to what would be obtained by having all the data in one location. In order to study the mental representation of concrete concepts, colleagues in the NIMH IRP are collecting vast datasets of behavioral judgments. We developed a new method for creating models of mental representations that can predict the behavior of participants in face of new stimuli not used in building the model, or on a variety of new tasks. Both of these projects have led to submitted publications (see below). Development of software packages We developed several software packages that were necessary for us or for the colleagues we collaborated with: 1) searchlight classification/regression models - the purpose is to produce maps of where certain information is present in brain imaging data, as measured by the ability to predict it from new images. 2) common latent variable models - the purpose is to wrap up several methods for identifying brain activation that is common across subjects exposed to the same stimuli, and automatically select parameter values from the data, to make the methods easier to use. 3) agglomerative clustering the purpose is to generate a brain parcellation in a completely data-driven way, by grouping adjacent voxels with similar behavior, and determining which parcels are reliable in different groups of subjects. In all three cases, our goal is to have the methods working fast enough to make it feasible to use them on very high-resolution functional MRI data from a 7 Tesla scanner. Collaboration with IRP researchers We have been helping researchers at NIMH and NINDS in projects that require machine learning tools, for instance: - automated prediction of brain segmentations (in second rather than hours) - creation of a model of mental representations of objects that explains complex behavior - identification of brain locations encoding a percept from high-resolution functional brain imaging data - detection of locations and times in a video stimulus containing information about subject decision - comparison of group-level testing methods for information mapping in functional brain imaging - comparison of the efficacy of different dynamic functional connectivity methods to capture cognitively relevant information - creation of a parcellation of the hypothalamus from very high resolution functional brain imaging data - prediction of the improvement of stroke patients after one year of follow-up, based on their condition at discharge and demographics, lab tests, treatment, and several other factors - generating predictors for naturalistic stimuli using deep neural networks - prediction of subject characteristics from functional brain imaging data acquired while seeing naturalistic stimuli - identification of groups of neurons driving different aspects of animal behavior, from calcium imaging data - prediction of the temporal evolution trajectory of schizophrenia patients from SNP genetic data - automatic annotation of video frames in movies to identify all body parts present, to allow quantification of eye tracking patterns in ASD patients and control subjects - identification of aspects of functional connectivity that distinguish patients with motor disorders from control subjects - prediction of patient HIV status from a variety of imaging and non-imaging measures In addition to these projects, we also provided informal guidance to researchers already using machine learning methods, in NIMH, NINDS, NIDA, NIAAA, and NCBI. Education and Training We hosted the following talks as part of the Machine Learning in Brain Imaging and Neuroscience seminar: - 1/30 - Yoshua Bengio - AI and Deep Learning - 2/27 - Adam Marblestone - Towards integration of Deep Learning and Neuroscience, - 3/20 - Nikolaus Kriegeskorte - Modeling brain processing with Deep Learning + Representational Similarity Analysis - 4/25 - Aude Oliva - Comparison of deep neuronal networks to spatio-temporal cortical dynamics of human visual object recognition - 5/16 - Francisco Pereira - Toward universal decoders of semantic content from brain imaging - 8/23 - Vince Calhoun - Deep Learning Approaches: Brief Introduction and Application to Neuroimaging We gave additional presentations within and outside NIMH: - 3/16 - Francisco Pereira - invited presentation at the Center for Brain, Behavior, and Cognition at Pennsylvania State University. - 2/9 - Patrick McClure - presentation and deep learning Q&A at the NCBI Deep Learning Journal Club - 7/3 - Patrick McClure - presentation on deep unsupervised learning at the NIH Image Segmentation Journal Club - 7/28 - Francisco Pereira - invited presentation at the Data-Intensive Brain Imaging: The State of the Art symposium at the Cognitive Science conference - 9/20-21 - Francisco Pereira - invited presentations at the University of Porto, Portugal Articles under review: Zheng, C.; Achanta, R.; Benjamini, Y. Extrapolating Expected Accuracies for Large Multi-Class Problems (in review at Journal of Machine Learning Research) McClure, P. and Kriegeskorte, N. Robustly representing uncertainty in deep neural networks through sampling (in review at Neural Computation)