We are entering a new era of personal genomics where an individual's genome sequence will be used to identify disease susceptibility, improve diagnosis and better treat illnesses as well as be combined across cohorts and populations to identify new biomarkers and causal mutations underlying any phenotype. Despite the tremendous success of mapping short read next-generation sequencing (NGS) data onto a reference genome (resequencing) in identifying genetic variation in a new genome, the inherent lack of long range connectivity together with reference-induced biases make obtaining complete haplotype-phased genomes exceedingly difficult. Emerging long read technologies are beginning to address this critical shortcoming by direct de novo assembly of an individual's genome. However, initial de novo assemblies typically consist of many thousands of unordered contigs that require extensive post-assembly processing to produce finished sequences that can be effectively mined for genetic content and variation. Thus, there is an urgent need for integrated, scalable post-assembly software that 1) automatically organizes, joins and phases the initial contigs into complete haplotype sequences, 2) supports optional NGS and/or manual polishing and 3) provides initial automated annotation of those sequences. Currently, such software does not exist and instead users must cobble together a confusing array of difficult-to-use, task-specific pieces of open source programs. DNASTAR's post-assembly editing program, SeqMan Pro (SMP), has a proven history in finishing bacterial sized genomes although it currently lacks the scalability and all the needed functionality to tackle human genome sized problems. The primary goal of this Fast Track proposal is to create a fully scalable version of SMP for the automated finishing and annotation of de novo assembled large eukaryotic genomes while also providing a manual editing platform when needed. During Phase I, we will develop two key prototypes: 1) a new assembly file format, eBAM, which is interconvertible with the BAM format, but also is editable like our SQD files and 2) a rapid reference-assisted contig scaffolding tool adapted from our proprietary Disk Sort Alignment (DSA) algorithm. With that foundation, we will complete the transformation of SMP in Phase II by: 1) refining the eBAM format for optimal editing performance, 2) building a new 64-bit version of the SMP editing engine that incorporates the additional functionality necessary for post-assembly finishing of large eukaryotic genomes including automated DSA-based scaffolding and phase-aware gap filling, contig joining and haplotype refinement, 3) creating a new DSA-based genome aligner for rapidly aligning a finished sequence to an annotated reference genome which together with 4) a new feature transfer and analysis module, will permit initial annotation of the finished genome along with a cataloging of variants and their impact in both native and reference coordinates. Inclusion of the reference coordinates allows variants in the new genome to be easily associated with the wealth of information available through the numerous online knowledgebase resources.