Sequence enrichment of long DNA to enable new modes of genomic analysis SBIR grant #192336 Project Summary/Abstract The assembly of phased maternal and paternal haplotypes from human genomic DNA is compromised when short DNA fragments are used for sequencing. The analysis of DNA methylation modifications is also severely compromised when short DNA is sequenced. A technology recently developed by PetaOmics, Inc. enables DNA sequence enrichment that yields long DNA fragments with sizes in the range of 20,000 bp. With future improvements, the capture of fragments of 30,000 bp should be possible. The technology is novel and disruptive, and has strong potential for successful commercialization via sales of cost-effective DNA target enrichment kits. Any region in the genome can be targeted for sequence enrichment, including regions with very high sequence variability. The enriched DNA is double-stranded, and both strands retain all the biologically relevant DNA methylation information. The widespread utilization of the technology will advance the field of genomics, especially when deployed in population studies of complex human diseases, where it can help to unravel the functional impact of alternative diplotypes (specific combinations of maternal and paternal genetic information). This Phase I research project aims to refine this novel technology by optimizing probe chemistry as well as DNA capture protocols. The utility of the technology will be demonstrated by targeting for sequence enrichment a segment of 0.62 megabases located within the MHC locus in human chromosome 6, characterized by complex structural variation that includes variable numbers of sequence repeats. The sequence-enriched material will consist of long, overlapping, double-stranded DNA fragments derived from the MHC region of interest. The Pacific Biosciences (PacBio) long-read DNA sequencing platform will be used to sequence the enriched DNA, and to perform a de novo sequence assembly of the targeted region of the human genome. The sequence-enriched material will additionally be used for perform DNA methylation sequencing using the PacBio platform. Long sequencing reads will be informative with regards to a subset of 5-hydroxy-methylcytosine modifications, and will be used to assemble very long DNA methylation contigs. Completion of these aims will set the stage for a Phase II project directed to demonstrating genome- wide applicability, and potential for rapid commercialization of sequence enrichment kits, as well as to deployment of novel DNA methylation sequencing applications of the technology that will have a dramatic impact in human epigenetics, population studies, and medical research in general. !