Abstract The broad, long-term objective of this project concerns the development of novel statistical methods and com- putational tools for statistical and probabilistic modeling of human microbiome and shotgun metagenomic data motivated by important biological questions and experiments. The speci c aim of the current project is to develop new statistical models, novel inference procedures, and fast computational algorithms for the analysis of 16S rRNA and shotgun metagenomic sequencing data in large-scale human microbiome studies. The project focuses on the development of model-based multi-sample approaches for quantifying microbiome compositions and development methods of compositional mediation analysis in order to quantify the e ects of microbiome mediating the e ect of treatment/risk factor on outcomes. In addition, this project will also develop novel methods for statistical inference including large-scale multiple testing procedures on sparse discrete Markov random eld (MRF) models for microbial interaction network construction and for di erential network analysis. These problems are all moti- vated by the PI's close collaborations with Penn investigators on metagenomic studies of Crohn disease, childhood obesity and disease progression among patients with chronic kidney disease (CKD)). The methods hinge on novel integration of biological insights and methods for modeling sparse count data, high dimensional compositional data analysis and network-based analysis, including nuclear-norm penalized maximum likelihood estimation for tax abundance estimation, compositional mediation model and Markov random eld based microbial network and di erential network analysis. The new methods can be applied to both 16S rRNA and shotgun metagenomic se- quencing data and will ideally facilitate the identi cations of microbial composition, subcomposition and microbial networks underlying various complex human diseases and biological processes. The project will also investigate the robustness, power and eciencies of these methods and compare them with existing methods. In addition, this project will develop practical and feasible computer programs for the implementation of the proposed meth- ods, and for the evaluation of the performance of these methods through extensive simulatons and analysis of various on-going microbiome studies through the PI's collaborations with Penn physicians and biologists. The work proposed here will contribute statistical methodology for modeling metagenomic sequencing data and high dimensional compositional data, theoretical inference methods for the MFR models and o er insights into each of the biological areas represented by the various data sets. All programs developed under this grant and detailed documentation will be made available free-of-charge to interested researchers.