ABSTRACT Despite decades of research, much of the genetic heritability of human disease remains unmapped to susceptibility loci; and many gene-phenotype effects do not neatly fit the patterns of heterogeneity required for well-powered analysis by GWAS nor family-based methods. Some genetic factors that contribute to disease fall on a detectable, shared haplotypic background, yet have an appreciable population frequency due to modest effects on disease risk. In such cases, analyses that utilize segmental sharing patterns in distant relatives, such as identity-by-descent (IBD) mapping, are optimal for disease-gene discovery. This approach has the advantage of allowing for: lower allele frequency of causal factors and higher allelic heterogeneity than GWAS, and lower penetrance, more modest effect sizes, and higher genetic heterogeneity than linkage. Additionally, the creation of large shared segment repositories allows for the identification of people who carry haplotypes known to harbor rare risk variants, enabling efficient uses of targeted sequencing for evaluating the effects of rare variants. Building on tools that we have developed as well as others', we propose the following aims to leverage genetic relatedness estimation and shared segments in big data environments: 1) Create a resource of shared segments in two large DNA biobanks. We will employ efficient and highly scalable software architecture to automate analyses of relatedness from genetic data, including deep and accurate relationship estimation and pedigree-aware shared segment detection across heterogeneous genetic data types. Existing and novel approaches will be employed in BioVU and BioME, two large EHR-linked DNA databanks to create shared segment repositories for use by the scientific community. Our analytic framework will improve scalability and support a variety of standard output formats to integrate with downstream analyses. 2) IBD mapping phenome-wide. Shared segments provide an opportunity to recover power to detect a tranche of disease-causing variants that contribute to the missing heritability of traits. Furthermore, we will establish the effect of genetic dysregulation of genes in regions significantly enriched with shared segments phenome-wide. 3) Demonstrate the utility of shared segments for identifying likely carriers of causal variants in cancer predisposition genes. We will identify individuals in BioVU and BioME likely to harbor pathogenic variants in known cancer predisposition genes by matching IBD segments shared between biorepository participants and cancer cases sequenced at MD Anderson (N>10,000) and performing follow-up genotyping of the loci to directly assess the clinical significance of the variants using the full EHR. Each aim represents an innovative approach to data utilization in large EHR-linked DNA databanks, and the creation of shared resources that will fuel future research. Collectively, our aims map a path towards efficient and affordable novel disease-gene discovery using shared segments.