Summary Alzheimer's disease (AD) and related dementia is a major health burden, affecting 5.8 million individuals in the US. The number of affected individuals is expected to grow from 5.8 million in 2019 to 14 million by 2060, making the development of effective prevention and treatment strategies a critical public health priority. AD has a substantial genetic component with heritability estimates around 70%. Genome-wide association stud- ies (GWAS) have identi?ed many loci robustly associated with the disease, and many of the implicated pathways seem promising. However, the translation of these discoveries into actionable targets has been slow, primarily due to the lack of mechanistic understanding. To address this problem, we have developed key approaches to assign function to GWAS discoveries imple- mented in the PrediXcan family of tools. As part of the GTEx consortium, we have trained and optimized prediction models for expression and splicing traits in 49 human tissues. Building on this work, we constructed PhenomeX- can, a knowledge base of the putative function of every heritable human gene based on the associations between the genetically regulated component of gene expression, and over 4000 human traits. We propose here, in Speci?c Aim 1, to expand PhenomeXcan using the genetic components of splicing ratios and protein levels. This multidimensional array will be interrogated with state of the art statistical methods to improve our ability to identify causal genes, pathways, and latent subtypes of AD. The primary goal of precision medicine is to provide the right therapy for the right patient at the right time. Therefore, the ability to cluster patients into subtypes with potentially different response to different treatments will be key in the journey to achieving precision medicine. Another source of large scale health-related data is the MarketScan database, which includes electronic health records of 250 million patients in the US. In Speci?c Aim 2, we propose to use this massive dataset as an orthog- onal source to identify AD subtypes. To investigate the pathogenesis and progression of AD, we will leverage the longitudinal aspect of the EHR. The EHR will be analyzed to ?nd clusters of phenotypes associated with AD diagnoses, and these clusters will be matched with the functional gene clusters found in SA1. The clusters of phenotypes and genes will be used to inform the modeling of disease progression. In Speci?c Aim 3, we use the UK Biobank, a resource of self-reported disease and EHR as well as genotype information, and BioVU, a separate biobank with genotype and EHR, to validate the results from SA1 and SA2, as well as synthesize the two results into a second iteration of disease clustering.