The development of Computational methods for interpreting sequence variants in the non-protein coding regions of the human genome has lagged behind the ability to generate large volumes of genome-wide associated study (GWAS) and whole-genome sequencing (WGS) data. In this project, we will develop innovative computational methods based on rigorous statistical modeling to integrate a large number of heterogeneous genomic data sets from diverse sources to identify non-coding variants that are candidates for affecting organismal function and leading to disease risk or other traits. Due to their genomic prevalence and functional importance, we will focus this proposed research on the specific class of genomic sites known as enhancers. By focusing on enhancers, we are able to develop rigorous statistical methodologies that can be extensively validated via experimental methods. The long-term goal is to accurately predict the sequence variants that confer a phenotypic effect. The objective in this particular application is to develop computational methods that analyze genomic data to identify a set of non-coding variants that are candidates for affecting organismal function and leading to disease risk or other traits. While our methods are intended to handle non- coding variants in different classes of sites identified in human genomes, in this application we will focus on phenotypic effects of variants in enhancers based on our central hypotheses are i) the majority of functionally- important, disease- and trait-associated variants in non-coding regions occur within enhancer regions, and ii) these variants not only alter enhancer actions on adjacent coding target genes, but also disrupt regulatory networks of enhancer interactions, leading to changes in broader programs of transcriptional regulation. These hypotheses have been formulated on the basis of our own preliminary data produced in the 9p21 gene desert, which is linked to specific types of cancer, cardiovascular disease, and type 2 diabetes, and is a locus where we have already made contributions linking GWAS data to a mechanistic understanding of specific enhancer functions. Guided by strong preliminary data, this hypothesis will be tested by pursuing two specific aims: 1) To predict causal enhancers variant by statistical modeling with biological networks; 2) To experimentally validate the computational predictions. The approach is innovative, because our computational approach is different from other software tools for analyzing sequence variants - e.g., RegulomeDB and FunSeq - as it integrates a large number of heterogeneous genomic data sets from diverse sources and incorporates rigorous statistical modeling of biological networks. The proposed research is significant, because by incorporating both genotypic and phenotypic information of genetic diseases and traits, our methods will be able to identify potential functional connections between non-coding variants and phenotypes, and facilitate a targeted analysis of whole-genome sequence data for disease risk assessment.