Hepatitis infection is a widespread and persistent crisis, both in the U.S. and worldwide. Chronic infection with the hepatitis B virus (HBV) is a major cause of end-stage liver disease, with staggering long-term healthcare costs. Thus, HBV research and innovation have been deemed national priorities in the U.S., as evidenced by calls-to-action by the Institute of Medicine and the Department of Health and Human Services. The intra-host HBV infection comprises a genetically diverse population of variants (quasispecies) that is an important determinant of pathogenesis and treatment outcome. Mapping the quasispecies is required; however map construction is difficult owing to HBVs' complex genome structure, variant divergence from reference genomes and a lack of accurate tools. Current de novo assembly algorithms intended for viral genome assembly produce inadequate single linear representations of a viral population. Algorithms meant for diploid genome assembly are taxed and confused by virology data, produce unnecessarily complex output and are computationally expensive. For this Phase I project, GATACA, LLC proposes to develop the Assembly Tool to accurately map intra-host HBV strains from short read data. Using novel steps, the tool will assemble the reads into multiple interconnected consensus sequences (contigs) as a map of global haplotypes. The contig sets will provide valuable reference data backbones for subsequent analyses. The tool will improve inter-host comparisons which depend on accurate HBV quasispecies parameters. The Assembly Tool will be integrated into existing software developed by GATACA. Specific Aims for Phase I are: (1) Develop, test and prototype a de novo algorithm based on novel iterative clustering and priority merging steps and represent global HBV variation as interconnected graphs. (2) Develop a validation algorithm for generating simulated HBV data, incorporating patient-derived HBV data and benchmarking the performance of the Assembly Tool against that of other viral genome assemblers. In Phase II, GATACA will develop HepBbase, a commercial web-based platform that will provide data management and allow users to plug-and-play familiar analysis tools alongside HBV-specific functions. An Assembly pipeline will be developed in Phase II to automate the labor-intensive steps of developing HBV draft genomes. GATACA will begin HepBbase commercialization efforts in Q2 2017 during Phase II development. Potential customers are HBV virologists in all research-based disciplines, who lack adequate or centralized user-friendly HBV management and analysis software. Discoveries made with our tool will also inform clinicians, based on assembled patient reference genomes. As the first virus-specific large- scale capacity bioinformatics platform, HepBbase will eliminate bottlenecks and facilitate collaboration.