Type 2 diabetes (T2D) is one of the major causes of morbidity and mortality in the developed world. While environmental factors such as diet play a significant role, familial clustering indicates that there must be significant genetic susceptibility factors at work. For more than two decades we have been engaged in a large collaborative study entitled FUSION (Finland - United States Investigation of NIDDM), in which more than 30,000 individuals with diabetes (and suitable controls) from Finland are being studied, using careful phenotyping of diabetes and diabetes associated quantitative traits, and genome-wide genetic linkage and association. We have collaborated with several groups around the world to increase the study sample size and our ability to detect these genetic susceptibility factors. We have developed and applied new high throughput genotyping approaches in the laboratory, which have allowed the collection of a massive amount of data from these Finnish diabetics and their families. Using the genome wide association study (GWAS) approach, we have contributed to the identification of more than 80 well-validated loci for T2D, and have identified >400 additional loci harboring variants that have important effects on obesity, fasting glucose, LDL and HDL cholesterol, triglycerides, proinsulin levels, blood pressure, and adult height. We are now investigating the functional basis of disease risk that arises from several of these variants. This analysis includes high throughput sequencing of these loci to identify common and rare alleles that may be driving the association, analysis of the relationship between gene expression and risk haplotypes, cell culture and biochemical assays, and mouse models. We have also performed large scale whole exome and whole genome sequencing of more than 2657 diabetics and controls, to look for rare variants of large effect that contribute to disease risk. However, we have not found evidence of rare coding variants that would explain the missing heritability of risk for the common form of diabetes. We are continuing our efforts to identify the cause of rare Mendelian forms of the disease such as neonatal diabetes (NDM), congenital hyperinsulinemia (CHI), and unmapped loci for Maturity Onset Diabetes of the Young (MODY). Confirmation of the effectiveness of this approach includes the identification in our laboratory of an autosomal dominant form of diabetes arising from a mutation in the Wolfram syndrome 1 gene. We are currently engaged in experiments to determine the functional relevance of several candidate variants identified in individuals with these rare Mendelian disorders. We are utilizing 1) standard transient or stable transfections of cell lines with recombinant wild type or mutant candidate genes and, 2) CRISPR-Cas9 technology to generate knock-ins and knock-outs to assess impact of candidate variants on glucose stimulated insulin secretion. Since >90% of T2D risk variants identified in GWAS studies are in non-coding regions, a major effort has been devoted to defining the epigenome of the human pancreatic islet and other diabetes-relevant tissues, by mapping a variety of chromatin marks across the entire genome. This has enabled identification of enhancers and insulators, some of which harbor variants that influence the risk of T2D. Detailed investigation has led to the discovery of large regions of regulatory enhancers greater than 3kb in length, which we term stretch enhancers. Stretch enhancers have been demonstrated to correlate with gene expression in a tissue-specific manner and are enriched in disease-associated variants. We have developed both bioinformatic and experimental approaches to identify variants in islet stretch enhancers that may be associated with T2D and T2D-related traits. To further define the human islet epigenome, we have performed genome-wide genotyping and RNA-sequencing on 112 anonymous deceased donor samples from islet distribution centers, and integrated chromatin accessibility profiles (ATAC-seq) from two islet samples. Our results show that T2D associated genetic variants are enriched in islet-specific regulatory regions, and we have identified the transcription factor RFX6 as a potential major regulator of genes involved in diabetes risk. We are also investigating the similarities and differences between islet epigenomes of humans, mice, and rats. The human pancreatic islet is composed of several cell types, with the insulin secreting beta cells representing only 40% of the total cell population. We are currently using single-cell RNA sequencing stategies to interrogate differential gene expression in the various islet cell types and also to tease out low level beta-cell specific gene expression that may be undetected in the bulk-islet RNAseq data. Another significant component of the project involves the collection of skin, muscle, and adipose biopsies from more than 300 individuals with normal glucose tolerance, impaired glucose tolerance, or early onset T2D. The genetic material from these tissue samples are being analyzed for genotype and gene expression to identify correlates with disease. The skin biopsies are being utilized to generate induced pluripotent stem cell lines (iPS), which in turn can be differentiated into tissues relevant to diabetes (including insulin producing cells), in order to study the relationships of disease risk alleles to cellular phenotype. We are in the process of generating iPS cell lines with the skin cells from 60 of these individuals. We have completed sequencing of the RNA from the muscle and adipose samples. Analyses of the largest ever human muscle RNA-seq data set have identified many expression quantitative trait loci (eQTLs), including some that link T2D-GWAS variants to their target genes. A particularly instructive example is the muscle specific isoform of ankyrin, ANK1, which harbors both an eQTL and a splicing QTL that is associated with T2D risk. We have been also collecting liver samples, another diabetes-relevant tissue. Additionally, we are investigating whether exosomal RNA from diabetic or non-diabetic patients might provide clues to etiology, or to a novel means of cell-to-cell signaling. Detailed analysis of DNA methylation in these samples is also being undertaken using the EPIC array, and a smaller subset is being more intensively analysed using whole genome bisulfite sequencing. Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to variability in depth of coverage. Thus, we have developed and implemented machine and deep learning algorithms to predict methylation values of low-quality CpG reads. These algorithms have the potential to enable epigenome-wide association studies (EWAS) with data from sparser and more cost-effective targeted assays, as well as to identify interesting biological features associated with methylation in different tissues. Our ultimate goal is to assess gene expression (via RNA-seq) and open chromatin structure (via ATAC-seq) from all tissues relevant to T2D, and determine their correlation with GWAS risk alleles and DNA methylation patterns to gain further insight into diabetes risk and possible novel avenues for prevention and treatment.