Project Summary/Abstract Somatic mutations ? that is, DNA alterations within cells in the body ? are responsible for genetic heterogeneity within an individual (termed mosaicism) and can drive progression to disease. Examples include mutations acquired early in development that lead to neurological disorders as well as oncogenic mutations acquired in adulthood that progress to cancer. The increasing availability of genomic data has enabled groundbreaking discoveries in both basic and translational work on mosaicism; in particular, recent studies have demonstrated that clonal mosaicism of blood cells is surprisingly common ? a near-inevitable result of aging ? and that solid tumor DNA can frequently be detected in the bloodstream, inspiring the new field of liquid biopsy. However, existing approaches for detecting mosaicism either require specialized assays or have limited sensitivity to events present at low cell fractions (<5%), hampering scientific progress. Additionally, the high rates of mosaicism in healthy individuals ? and correspondingly low rates of progression to cancer ? imply that large epidemiological studies will be necessary to identify clinically relevant mutations and avoid false positives. To accelerate continued biological insights into mosaicism and achieve the promise of liquid biopsy, new detection methods must be developed and applied to very large cohorts. This proposal will develop and apply a novel approach to somatic structural variant (SV) detection ? based on statistical haplotype phasing rather than new experimental techniques ? that will allow highly sensitive detection of mosaic SVs (at fractions as low as 0.1%) in all forms of genomic data. Crucially, this methodology will allow harnessing the power of huge biobank studies by computationally analyzing data already being produced. The key idea of the approach is to harness latent information about haplotype phase (i.e., the maternal vs. paternal inheritance of alleles) that is lost by standard genomic assays but can be statistically recovered using methods from population genetics. Inferred phase information can then be used to reveal subtle imbalances between maternal vs. paternal allelic fractions over long chromosomal segments, which are the hallmark of mosaic SVs. This technology will be used to investigate mosaic structural variation in blood (to understand the etiology and effects of clonal expansions of blood cells), neurons (to understand mutation rates and effects in brain development), and cancer (to develop methods for blood-based diagnosis and monitoring and to improve detection of tumor heterogeneity).