Project Summary Scientific understanding of adaptive immune receptors (i.e. antibodies and T cell receptors) has the potential to revolutionize prophylaxis, diagnosis, and treatment of disease. High?throughput DNA sequencing and functional experiments have now brought the study of adaptive immune receptors into the big?data era. To realize this potential of these data they must be matched with appropriately powerful analytical techniques. Existing probabilistic and mechanistic models are insufficient to capture the complexities of these data, while a nave application of machine learning cannot leverage our profound existing knowledge of the immune system. The goal of this project is to blend deep learning with mechanistic modeling in order to predict and understand the evolution and function of adaptive immune receptors. Aim 1: Develop generative models of immune receptor sequences that capture the complexity of real adaptive immune receptor repertoires. These will combine deep learning along with our knowledge of VDJ recombination, and provide a rigorous platform for detailed repertoire comparison. Aim 2: Develop quantitative mechanistic models of antibody somatic hypermutation that incorporate the underlying biochemical processes. Estimate intractable likelihoods using deep learning to infer important latent variables, and validate models using knock?out experiments in cell lines. Aim 3: Develop hybrid deep learning models to predict binding properties from sequence data, combining large experimentally?derived binding data with even larger sets of immune sequences from human immune memory samples. Incorporate structural information via 3D convolution or distance?based penalties. These tools will reveal the full power of immune repertoire data for medical applications. We will obtain more rigorous comparisons of repertoires via their distribution in a relevant space. These will reveal the effects of immune perturbations such as vaccination and disease, allowing us to pick out sequences that are impacted by these perturbations. We will have a greater quantitative understanding of somatic hypermutation in vivo, and statistical models that appropriately capture long?range effects of collections of mutations. We will also have algorithms that will be able to combine repertoire data and sparse binding data to predict binding properties. Put together, these advances will enable rational vaccine design, treatment for autoimmune disease, and identification of T cells that are promising candidates for cancer immunotherapy.