ABSTRACT Venous thromboembolism (VTE) is a heritable condition that includes deep vein thrombosis and pulmonary embolism and is associated with marked morbidity and mortality. Genomic discovery can be a first step to better understand human biology and it is well established that VTE has important genetic determinants. Evidence supporting new associations has grown with advances in technology and sample size. Initial exploration of rare and uncommon variation across the 3.1 billion nucleotides that constitute the human genome using whole-genome sequencing (WGS) is providing interesting results for rare, potentially-damaging single nucleotide variants (SNVs) in known VTE risk loci. Further, WGS can be used to fully characterizing structural variants (SVs) genome-wide. Our preliminary data from the Atherosclerosis Risk in Communities study using TOPMed WGS data reveals rare SV deletions in multiple genes known to be associated with risk for VTE, including a deletion affecting a fibrinogen structural gene (FGB) that is associated with a 2-fold increase in VTE risk. A systematic and agnostic genome-wide search for SVs associated with VTE has not been conducted and will not be possible without dedicated resources, such as those proposed in this application. This proposal is a collaboration between leadership within the TOPMed VTE Working Group, the TOPMed Structural Variation Working Group, and the International Network Against Venous Thrombosis (INVENT) Consortium. The following specific aims seek to uncover novel genomic variation (SVs and SNVs) contributing to VTE risk using data from 17 studies that include 46,200 VTE cases and over 380,000 controls. Aim 1: In TOPMed and INVENT participants, use novel methodology to generate high-quality and more accurate SV calls than the SV calling algorithms currently available for both WGS and existing array data. The new calls will be filtered, curated, and will be available for Aim 2. The SV call will also be shared within TOPMed and applied by working groups to other cardiovascular, lung, and sleep phenotypes. Aim 2: Conduct association analyses of SVs with VTE. Discovery: Using SV calls derived in Aim 1 to conduct gene-based, segmental association analyses in up to 32,545 VTE cases and 170,000 controls. Validation and Replication: SVs from statistically-significant gene-based association findings will be validated using quantitative PCR and will be replicated in up to 13,655 VTE cases and 210,000 controls in populations/studies not used in the discovery phase. Aim 3: Conduct association analyses of SNV genotypes with VTE. Discovery: Using SNV data derived from WGS and WGS-imputed data (including insertions and deletions), conduct aggregate-variant analyses of genes and regulatory regions for SNVs with MAF <0.05 for association with VTE in up to 32,454 cases and 170,000 controls using burden tests and the Sequence Kernel Association Test (SKAT). We also propose analyses that combine SV and SNV data to analyze compound heterozygotes for loss-of-function mutations and their association with VTE. Replication: Significant associations will be replicated as in Aim 2.