Mycobacterial disease, primarily tuberculosis, kills nearly two million people annually. Ineffective vaccines, as well as multi-drug and extremely-drug resistant strains of M. tuberculosis, exacerbate this chronic global crisis. The application o genome-scale approaches to M. tuberculosis (Mtb) provides a new and powerful tool for biological insights. The foundation of any genomic approach is an accurately annotated genome, particularly knowing the precise boundaries of active genes and the proteins they encode. Annotation pipelines struggle with genomes that have significant nucleotide bias and atypical gene structures. As such, the annotations that help basic and clinical researchers navigate the Mtb genome are often inaccurate. We have integrated RNA-seq with ribosomal profiling (Ribo-seq) to empirically determine transcription and translation initiation sites on a genome scale, reducing the reliance on computational gene predictions. Our survey of the model mycobacterium, M. smegmatis, showed that about one-third of transcription start sites were also translation initiation sites, indicating a large group of genes without a 5' UTR. These leaderless genes lack a Shine-Dalgarno sequence, the traditional landmark that helps to predict translation initiation sites, frequently contributing to their misannotation. Our empirical data alo identified >300 unannotated peptides encoded upstream of annotated genes. As a class, these upstream peptides have not been well studied, yet there are precedents for cis-regulatory or other functional roles. These data allowed a much more accurate re- annotation of the M. smegmatis genome, which will augment the precision and confidence of all subsequent work predicated on genome annotations. Here, we propose to re-annotate the Mtb genome using the same integrative approach. Mtb will be cultured under standard laboratory conditions, and under conditions that simulate in vivo environments to maximize the expression of as many genes as possible, especially those most relevant to pathogenesis. Our Ribo-seq analyses will be particularly informative in mapping protein N-termini and upstream peptides with unmet precision and sensitivity. We expect to identify many hundreds of new gene starts, novel peptides (both leadered and leaderless) and non-coding RNAs that will provide vital information to the mycobacterial community. While our data will be instrumental in defining gene boundaries in the Mtb genome, they will also offer new biological insights into the fledgling research areas of leaderless translation initiation, peptidomics, translational regulation, and sequence context modulation of RNA polymerase or ribosome processivity. Each of these understudied topics will expand our knowledge of gene architecture and regulation, and the biology of Mtb, providing new insights for therapeutic targets. This proposed R21 applies cutting-edge tools to generate data assured to provide an empirically supported re-annotation of the Mtb genome that will have an immediate and protracted impact in field, while providing new biological insights that will seed multiple emerging fields of study.