Project Summary The advance of the Human Microbiome Project provides the unprecedented opportunities for exploring the critical roles played by commensal microbiota in human health, immune maintenance, and disease. Massive megagenomic sequencing data have been produced in this blooming research area. Unique features, including extremely large dimensionality, complex correlation, zero-inflation, and compositional nature, of the produced data pose a huge challenge for analysis in terms of both methodology and computation, and render many existing statistical approaches inapplicable. Ignoring or inappropriately handling these features likely leads to distorted medical conclusions. Unfortunately, few formal analysis tools are available to address these challenges, mainly data transformation and dimension reduction methods (typically, PCA) in mediation analysis that lack of direct interpretability of the results; and penalized variable selection methods that are incapable of handling longitudinal response variables, and high-dimensional functional and compositional covariates. This proposal is devoted to developing a new set of statistically systematic and computationally efficient methods for utilizing complex and high-throughput microbial taxa measurements to explore the associations with treatment-related infection in disease. The specific aims are: (1) Developing a clustering mediation model system to study the mediating effects of microbiota on chemotherapy in terms of the association with infections in AML; (2) Performing variable selection and covariance estimation for longitudinal microbial alpha-diversity in varying-coefficient models with high-dimensional and compositional taxa measurements as the covariates in AML; (3) Identifying important microbial taxa, which have two unique features---functional (measured over the time) and compositional (relative abundance), to be associated and predictive of chemotherapy-related infection in AML. Testing and validating the proposed analytical tools, and software development are two accompanying secondary aims. Mediation analysis, clustering, functional varying-coefficient models, functional regression models, data adaptive regularization for model selection, standard and non-standard theory for statistical tests are among the major statistical components in the proposal.