PROJECT SUMMARY Exome and whole-genome sequencing are becoming increasingly routine approaches in cancer[1], common disease[2]and rare disease diagnosis.[3] Despite their success, our ability to fully interpret the clinical relevance of personal genome variation remains a significant gap[4-6]. Considering this, the most crucial need is more genotype-phenotype data that link genetic variation with disease causation. The objective of this proposal is to improve the clinical interpretation of genetic variation; in particular, by developing integrative approaches that predict the effect of genetic variation on clinical phenotype. This proposal addresses the hypothesis, supported by preliminary data, that combining patient transcriptomic data with genotypic and clinical data (as opposed to each alone) offers a better mechanistic understanding of disease natural history, from initial presentation to progression. The specific aims are designed such that each independently add substantial functional genomic information, over and above previously available patient genetic data, to further resolve the clinical phenotype. Aim 1 establishes a comprehensive and widely-shared dataset of patient transcriptomic (and genetic) variation across multiple cancer, cardiovascular and thrombosis/bleeding phenotypes, in patients with somatically-acquired myeloproliferative neoplasms (MPN) and select other rare heritable blood diseases (HBD). Aim 2 methodically determines differential RNA expression and processing between clinically-relevant subgroups of MPN and HBD patients. Aim 3 brings these elements together ? and applies two integrative Bayesian and machine learning approaches, RIVER[24] (RNA-informed variant effect on regulation) and LASSO[25] (Least Absolute Shrinkage and Selection Operator), to resolve the functional and clinical relevance of rare variants; and identify signatures most predictive of disease risk or progression. Completion of these aims will contribute new scientific knowledge on how integrating transcriptomic data improves clinical genomic analyses in other genetic (and rare) diseases. In addition, this project will enable the Principal Investigator to develop expertise in the informatics and data science aspects of genomic medicine that complement her current background in biophysics, biochemistry and translational hematology. Combined with additional informatics training at Stanford University through coursework, seminars, one-on-one advising from project mentors, and interactions with the wider statistics, bioinformatics and genomics communities, this project will prepare the Principal Investigator to launch an independent academic career in genomic medicine.