ABSTRACT Rapid progress in biomedical informatics has generated massive high-dimensional data sets (?big data?), ranging from clinical information and medical imaging to genomic sequence data. The scale and complexity of these data sets hold great promise, yet present substantial challenges. To fully exploit the potential informativeness of big data, there is an urgent need to find effective ways to integrate diverse data from different levels of informatics technologies. Existing approaches and methods for data integration to date have several important limitations. In this project, we propose novel statistical methods and strategies to integrate neuroimaging, multi-omics, and clinical/behavioral data sets. To increase power for association analysis compared to existing methods, we propose a novel multi-phenotype multi-variant association method that can evaluate the cumulative effect of common and rare variants in genes or regions of interest, incorporate prior biological knowledge on the multiple phenotype structure, identify associated phenotypes among multiple phenotypes, and be computationally efficient for high-dimensional phenotypes. To improve the prediction of clinical outcomes, we propose a novel machine learning strategy that can integrate multimodal neuroimaging and multi-omics data into a mathematical model and can incorporate prior biological knowledge to identify genomic interactions associated with clinical outcomes. The ongoing Alzheimer's Disease Neuroimaging Initiative (ADNI) and Indiana Memory and Aging Study (IMAS) projects as a test bed provide a unique opportunity to evaluate/validate the proposed methods. Specific Aims: Aim 1: to develop powerful statistical methods for multivariate tests of associations between multiple phenotypes and a single genetic variant or set of variants (common and rare) in regions of interest, and to develop methods for mediation analysis to integrate neuroimaging, genetic, and clinical data to test for direct and indirect genetic effects mediated through neuroimaging phenotypes on clinical outcomes; Aim 2: to develop a novel multivariate model that combines multi-omics and neuroimaging data using a machine learning strategy to predict individuals with disease or those at high-risk for developing disease, and to develop a novel multivariate model incorporating prior biological knowledge to identify genomic interactions associated with clinical outcomes; Aim 3: to evaluate and validate the proposed methods using real data from the ADNI and IMAS cohorts; and Aim 4: to disseminate and support publicly available user-friendly software that efficiently implements the proposed methods. RELEVANCE TO PUBLIC HEALTH: Alzheimer's disease (AD) as an exemplar is an increasingly common progressive neurodegenerative condition with no validated disease modifying treatment. The proposed multivariate methods are likely to help identify novel diagnostic biomarkers and therapeutic targets for AD. Identifying new susceptibility loci/biomarkers for AD has important implications for gaining greater insight into the molecular mechanisms underlying AD.