The broad, long-term objective of this project concerns the development of novel statistical methods and computational tools for statistical and probabilistic modeling of large-scale multiple genomics data motivated by important biological questions and experiments. New high-throughput technologies and next generation sequencing are generating various types of very high-dimensional genomic and proteomic data and metadata (e.g., networks and pathways databases) in order to obtain a systems-level understanding of various complex phenotypes. As the amount and complexity of the data increases and as the questions being addressed become more sophisticated, statistical analysis methods that can integrate these genomic data and in the meanwhile can incorporate information about gene function and pathways into analysis of numerical vector/matrix data are required in order to draw valid statistical and biological inferences. The specific aims of the current project are to develop new statistical models and methods for integrative analysis of genomic data in the context of pathways and networks. Motivated by analysis of genetic genomics data and diverse cancer genomic data, the first aim is to develop novel statistical methods for estimating genotype-adjusted precision matrix for a set of genes at the transcriptional levels. The resulting regression coefficient matrix and sparse precision matrix provide important information on gene regulation when the cis- and trans-genetic effects on gene expressions are adjusted. The second aim is to develop high dimensional instrumental variable regression for eQTL data analysis in order the identify the potential causal genes for a phenotype where the genome-wide genotypes are served as instrumental variables. Aims 3 and 4 propose a set of new methods for gene set enrichment analysis, including methods for gene-set analysis by testing homogeneity of the covariance matrices and a class of multivariate random-set methods for integrative analysis of diverse genomic data. These methods hinge on novel integration of methods for high dimensional regression and high dimensional covariance matrix estimation and novel incorporation of prior functional gene sets and pathways. The new methods can be applied to different types of genomic data and will ideally help facilitate the identification of genes and their complex interactions as well as the biological pathways underlying various complex human diseases. The work proposed here will contribute statistical methodology to modeling high dimensional genomic data and to studying complex phenotypes and biological systems and offer insights into each of the biological areas represented by the various data sets. All programs developed under this grant and detailed documentation will be made available free-of-charge to interested researchers. PUBLIC HEALTH RELEVANCE: This project aims to develop powerful statistical and computational methods for integrative analysis of diverse genomic data. The novel statistical methods are expected to gain more insights into how genomic perturbation and pathways dysfunction can lead to development of complex diseases such as neuroblastoma and human heart failure.