Integrated pathogenicity assessment of clinically actionable genetic variants ! Project Summary/Abstract Large biobanks such as All of Us and the Million Veteran Project have now collected genetic data from millions of patients, and other population health studies are expanding rapidly. The interpretation of variants in clinically actionable disease genes is becoming increasingly common in such projects. The American College of Medical Genetics and Genomics has recommended that sequence interpretation include a minimum set of 59 genes regardless of the indication for sequencing (ACMG 59). These genes are responsible for a variety of clinical syndromes and have been extensively studied. However, even in well-studied disease genes, the majority of variants are only observed in one or two families. which makes it challenging to be sure of their role in causation of disease. Further, while there may be existing evidence about a variant, it is often inadequate for interpretation, as many variants in databases were originally identified in small, symptomatic cohorts without matched control groups, so their associations can suffer from incorrect estimates of significance or effect size, and a non-trivial fraction are likely to be spurious. For these reasons, a central challenge in clinical genomics is to interpret variants in clinically actionable genes that are identified during sequencing. Because the ACMG 59 genes have been studied intensively due to their clinical applicability, there is a unique abundance of functional and structural data that can be used to improve predictions. Here, we propose to develop new data that can be leveraged in the clinical assessment of variants including novel predictions of structural consequences, regional and structurally-informed selective constraint, and clinical risk from clinical diagnostic and epidemiologic health data. Using these data, we will develop a Bayesian statistical model to predict the effects of mutations that can complement existing assessments made by consortia and clinical laboratories. This will specifically include efforts to intensively improve computational predictions of structural and functional impact using the extensive scientific and medical knowledge in each of these genes. Next, we combine that structural and functional insight with large-scale population data. We will measure statistical aberration of variation for related groups of missense variants, and also identify groups of variant sites which are enriched in recurrent somatic or germline variation associated with cancer. Finally, we will develop a Bayesian prediction framework that integrates the full set of variant observations and characteristics to improve predictions of clinical risk for individual variants, and prospectively measure its performance in a clinical diagnostic laboratory. ! !