Type 2 diabetes (T2D) is one of the major causes of morbidity and mortality in the developed world. While environmental factors such as diet play a significant role, familial clustering indicates that significant genetic susceptibility factors must be at work. For more than two decades we have been engaged in a large collaborative study entitled FUSION (Finland - United States Investigation of NIDDM), in which more than 30,000 individuals with diabetes (and suitable controls) from Finland are being studied, using careful phenotyping of diabetes and diabetes-associated quantitative traits, and genome-wide genetic linkage and association. We have collaborated with several groups around the world to increase the study sample size and our ability to detect these genetic susceptibility factors. We have developed and applied new high throughput genotyping approaches in the laboratory, which have allowed the collection of a massive amount of data from these Finnish diabetics and their families. Using the genome wide association study (GWAS) approach, we have contributed to the identification of more than 80 well-validated loci for T2D and, have identified >400 additional loci harboring variants that have important effects on obesity, fasting glucose, LDL and HDL cholesterol, triglycerides, proinsulin levels, blood pressure, and adult height. We are now investigating the functional basis of disease risk that arises from several of these variants. This analysis includes high throughput sequencing of these loci to identify common and rare alleles that may be driving the association, analysis of the relationship between gene expression and risk haplotypes, cell culture and biochemical assays, and mouse models. We have also performed large scale whole-exome and whole-genome sequencing of more than 2657 diabetics and controls, to look for rare variants of large effect that contribute to disease risk. However, we have not found evidence of rare coding variants that would explain the missing heritability of risk for the common form of diabetes. We are continuing our efforts to identify the cause of rare Mendelian forms of the disease such as neonatal diabetes (NDM), congenital hyperinsulinemia (CHI), and unmapped loci for Maturity Onset Diabetes of the Young (MODY). Since >90% of T2D risk variants identified in GWAS studies are in non-coding regions, a major effort has been devoted to defining the epigenome of the human pancreatic islet and other diabetes-relevant tissues, by mapping a variety of chromatin marks across the entire genome. This has enabled identification of enhancers and insulators, some of which harbor variants that influence the risk of T2D. Detailed investigation has led to the discovery of large regions of regulatory enhancers greater than 3kb in length, which we term stretch enhancers. We have shown that stretch enhancers correlate with gene expression in a tissue-specific manner, and are enriched in disease-associated GWAS variants. We have developed both bioinformatic and experimental approaches to identify variants in islet stretch enhancers that may be associated with T2D and T2D-related traits. To further define the human islet epigenome, we have performed genome-wide genotyping and RNA-sequencing on 112 anonymous deceased donor samples from islet distribution centers, and integrated chromatin accessibility profiles (ATAC-seq) from two islet samples. Our results show that T2D associated genetic variants are enriched in islet-specific regulatory regions, and we have identified the transcription factor RFX6 as a potential major regulator of genes involved in diabetes risk. We have contributed significantly to the international Integrated Network for Systematic analysis of Pancreatic Islet RNA Expression (InsPIRE) consortium, studying the genetic regulation of gene expression in pancreatic islets. We are also investigating the comparative analysis of islet epigenomes and gene expression in humans, mice, and rats. The human pancreatic islet is composed of several cell types, with the insulin secreting beta cells representing only 40% of the total cell population. We are currently using single-cell RNA sequencing strategies to interrogate differential gene expression in the various islet cell types, and also to tease out low level beta-cell specific gene expression that may be undetected in the bulk-islet RNAseq data. A major component of the FUSION study is a project focused on tissue biopsies from individuals with extensive genotype and phenotype information. We have collected skin, muscle, and adipose biopsies from more than 300 individuals with normal glucose tolerance, impaired glucose tolerance, or early onset T2D. The skin biopsies are used to generate induced pluripotent stem cell lines (iPSC), which in turn are being differentiated into tissues relevant to diabetes (including insulin producing cells), in order to study the relationships of disease risk alleles to cellular phenotype. We have generated iPSC cell lines from the skin cells from 60 of these individuals and are pursuing the differentiation of normal and T2D iPSCs to insulin secreting mature beta cells, for the comparative analysis of developmental and functional effects of genetic background of the subjects. Analyses of the largest ever human muscle RNA-seq data set have identified many expression quantitative trait loci (eQTLs), including some that link T2D-GWAS variants to their target genes. A particularly instructive example is the muscle specific isoform of ankyrin, ANK1, which harbors both an eQTL and a splicing QTL that is associated with T2D risk. We have completed sequencing of the RNA from the adipose samples and are currently analyzing this gene expression data with genotypes (from DNA) to identify correlates with disease. To further expand our understanding of T2D traits, we are collecting global metabolomics data on muscle and adipose biopsy samples as well as global metabolomics plus complex lipid analysis of plasma taken during an oral glucose tolerance test (OGTT) used as the diagnostic of T2D status of our biopsy subjects. We aim to integrate metabolomics data with genetic architecture, T2D related traits and eQTLs to investigate potential dynamic interactions as a result of T2D status. We have also been collecting liver samples, another diabetes-relevant tissue, and have completed the sequencing of the RNAs. Similar to pancreatic islet cell analyses, we are performing ATAC-seq to integrate chromatin structure with gene expression in liver tissue. Detailed analysis of DNA methylation in a subset of these tissue samples is being performed using the Illumina EPIC array, and a smaller subset was more intensively analyzed using whole genome bisulfite sequencing. We have gone on to develop and implement machine and deep learning algorithms to predict methylation values of low-quality CpG reads. These algorithms have enabled us to perform epigenome-wide association studies (EWAS) with data from sparser and more cost-effective targeted assays, as well as to identify interesting biological features associated with methylation in different tissues. Furthermore, we have undertaken novel approaches to utilize Mendelian randomization to ascertain whether DNA methylation drives variation in gene expression, or the other way around. Our ultimate goal is to measure gene expression (via RNA-seq) and open chromatin structure (via ATAC-seq) from all tissues relevant to T2D and determine their correlation with GWAS risk alleles and DNA methylation patterns, to gain further insight into diabetes risk and possible novel avenues for prevention and treatment.