Abstract It is possible to combine technologies based on single molecules to achieve de novo genome sequence assembly with phasing and genome-wide structural variation identification. De novo assembled whole genome sequencing fully describes the diploid human genome except for a small number of long, highly repetitive sequences such as the centromeres, telomeres, and near-identical segmental duplications. Key to the success of phased genome sequence assembly is the single molecule mapping approach developed by our group that starts with sequence-specific labeling of long (180 kb to >1 Mb), double-stranded genomic DNA fragments with fluorophores followed by high-throughput, automated imaging and analysis of the linearized fluorescent DNA molecules in nanochannel arrays on a commercially available instrument. During the next phase of this project, we propose to produce phased genome sequence assemblies of 2 individuals from each of all 26 ethnic groups of the 1000 Genomes Project to serve as general references for the community. In addition, we will further develop the technology to map repetitive elements that are difficult to interrogate genome-wide and to precisely phase target regions over a long range. The approach we are taking to construct de novo phased and assembled genomes will produce ?near reference grade? genomes with high efficiency at low cost for many ethnic groups around the world. These reference sequences will increase substantially the value of all the whole genome sequences already obtained and provide further insight into structural variation patterns across human populations. The technology development aims of this proposal will address some of the most difficult questions facing genome analysis today. At the end of this three-year project, a robust method for phased genome assembly, repetitive sequence mapping, and long-range phasing will be developed and ready for application in many areas of genome research.