Investigating the microbial basis of early childhood caries via metagenomics and metatranscriptomics analyses Abstract The increasing availability and scale of omics data have revolutionized our ability to understand complex biological processes underlying health and disease. Such biologically-informed insights are aligned with the notion of precision medicine and have the potential to improve diagnoses, prevention and treatment. In the oral health domain, multiple omics data layers (e.g., genomics, metagenomics, transcriptomics, metabolomics), intended to capture aspects of otherwise unobservable biology, are increasingly being collected in oral health studies. However, methods for powerful and informative integration of information gained from these multiple data layers remains elusive. The focus of this proposal, early childhood caries (ECC), is the most common chronic childhood disease. ECC is defined as dental decay among children under the age of 6? it persists as a clinical and dental public health problem, and confers substantial and multi-level human and economic impacts. The advent of precision oral health care, based upon a new, microbially-informed understanding of ECC, is expected to shed light onto mechanistic aspects of the disease processes and reveal new ways to prevent it. To this end, we will analyze existing clinical (i.e., ECC case status) and matched metagenomics (whole genome sequencing shotgun; WGS) and metatranscriptomics (RNA-seq.) data from supragingival plaque samples of 170 preschool-age children, mainly ages 3 and 4, enrolled in a community-based oral health study in NC. The goal of the proposed study is to identify ECC-associated bacteria, bacterial genes and pathways via metagenomics and metatranscriptomics analyses, conducted separately and jointly. Aside from the unique characteristics (e.g., matched WGS and RNA-seq. data from the same biofilm sample in each participant), quality and size of the dataset, the proposal's novelty is amplified by the testing, development and dissemination of appropriate statistical methods and optimized analytical pipelines. Seven models will be evaluated via rigorous simulations, accounting for the handling of over-dispersion, zero-inflation, more than 2 phenotype groups and batch effects, and will be optimized prior to the real study data application. Upon completion, we anticipate that the study will provide novel insights into the microbial basis of ECC. The integrative data analysis framework will offer opportunities to accommodate additional metabolomics data as they become available, to further increase the potential for mechanistic insights.