Recent studies show that large double stranded DNA (dsDNA) virus proteomes are much more complex than previously appreciated. Ribosome profiling and mass spectrometry analyses have identified hundreds of novel coding regions in multiple dsDNA viruses, raising the fundamental question of how do dsDNA viruses encode and regulate such expansive proteomes? We have a limited understanding of the mechanisms by which dsDNA viruses generate such diverse proteomes, in large part due to our poor understanding of viral transcript structure. Defining the sequence of full-length viral transcripts (the full-length transcriptome) identifies the mRNAs that encode viral proteins, and potentially novel transcripts encoding currently unrecognized viral proteins. Defining full-length transcriptomes also provides a global view of viral promoters, and 5' untranslated regions (UTRs) that regulate mRNA stability and translation efficiency. Standard RNA-Seq approaches are insufficient to identify full-length dsDNA virus transcriptomes as widespread transcription of the viral genomes creates over-lapping transcriptional units that can prevent the definition of transcript termini, and complicate assignment of splice junctions to specific viral transcripts. Asa result, the full-length transcriptome has not been defined for any large dsDNA virus. New, high throughput approaches are needed to define full-length dsDNA virus transcriptomes, and thereby understand the regulation of viral gene expression and the mechanisms driving viral proteome complexity. Using human cytomegalovirus (HCMV) as a model large dsDNA virus, we propose an innovative combination of techniques and analyses to define full-length dsDNA virus transcriptomes. Our multi-disciplinary approach uses novel bioinformatics methods to merge long and short read length sequencing data, known as hybrid sequencing, with high definition transcription start site analysis to provide an end-to-end view of full-length viral transcripts n a genome-wide scale. Our preliminary results show that our approach identifies novel HCMV coding regions resulting from alterative transcription start usage and splicing, and new promoters controlling the expression of viral transcripts. Successful completion of this project will provide a new paradigm for defining full-length dsDNA virus transcriptomes that will be useful for determining the mechanism driving dsDNA virus proteome complexity, and the regulatory mechanisms controlling viral gene expression.