PROJECT SUMMARY The microbiome, which plays an important role in human health and disease, is generally characterized using high throughput genome sequencing. However, the laboratory processes required for microbial metagenomic sequencing can introduce spurious measurement noise due to, for example, DNA extraction, amplification, sequencing depth, GC bias, batch effects, laboratory protocols, and bioinformatics processing. Without correction, the magnitude of sample- and study- specific variation can easily exceed the magnitude of variation due to treatment or disease status. Therefore, diagnosis and treatment of diseases and infections based on microbial sequencing is impeded by spurious noise that masks true biological signal. The overall goals of this research are to develop new statistical methods for the analysis of microbiome data, including taxonomic, functional, and metabolic data. Our statistical models will explicitly model batch and technical variation, allowing us to distinguish, rather than conflate, biological signal and non-biological noise. Our new models will leverage commonly-collected sequence data, such as positive controls and technical replicates, which are not typically utilized by researchers in their statistical analysis of microbiome data. By designing statistical methods that use existing data sources, we will reduce the amount and cost of sequencing required to detect true biological signals. Our models will allow us to perform hypothesis testing for differential abundance of microbial genes, strains, and metabolites, as well as shifts in the diversity of microbial communities, without discarding biological signal or detecting spurious technical noise due to imperfect laboratory protocols and instrumentation. The methods are applicable to a broad range of experimental designs (including observational and longitudinal), biomedical research methods (including model systems and clinical trials), and sequencing platforms (including marker gene and whole genome sequencing as well as spectrometric methods for metabolic and proteomic profiling). Our statistical methods will be distributed as freely available, open-source software, which will include extensive tutorials, and forums for user questions. By avoiding detection of signals due to sample- and study-!specific artefacts, our methods will increase the reproducibility of microbiome research, and facilitate the identification of therapeutic and diagnostic opportunities in microbiome science.