Type 2 diabetes (T2D) is one of the major causes of morbidity and mortality in the developed world. While environmental factors such as diet play a significant role, familial clustering indicates a significant role for genetic factors. For more than two decades we have been engaged in a large collaborative study entitled FUSION (Finland - United States Investigation of NIDDM), in which more than 30,000 individuals with diabetes (and suitable controls) from Finland are being studied. . Using the genome wide association study (GWAS) approach and in collaboration with other groups around the world, we have contributed to the identification of more than 240 well-validated loci for T2D and, have identified >1000 additional loci harboring variants that have important effects on obesity, fasting glucose, LDL and HDL cholesterol, triglycerides, proinsulin levels, blood pressure, and adult height. We have also performed large scale whole-exome and whole-genome sequencing of more than 2657 diabetics and controls, to look for rare variants of large effect that contribute to disease risk. We are continuing our efforts to identify the cause of rare Mendelian forms of the disease such as neonatal diabetes (NDM), congenital hyperinsulinemia (CHI), and unmapped loci for Maturity Onset Diabetes of the Young (MODY). Since >90% of T2D risk variants identified in GWAS studies are in non-coding regions, a major effort has been devoted to defining the epigenome of the human pancreatic islet and other diabetes-relevant tissues. This has enabled identification of enhancers and insulators, some of which harbor variants that influence the risk of T2D. Detailed investigation has led to the discovery of large regions of regulatory enhancers greater than 3kb in length, which we term stretch enhancers. We have shown that stretch enhancers correlate with gene expression in a tissue-specific manner, and are enriched in disease-associated GWAS variants. To further define the human islet epigenome, we have performed genome-wide genotyping and RNA-sequencing on 185 anonymous deceased donor samples, and integrated chromatin accessibility profiles (ATAC-seq) from two islet samples. Our results show that T2D associated genetic variants are enriched in islet-specific regulatory regions, and we have identified the transcription factor RFX6 as a potential major regulator of genes involved in diabetes risk. We also contributed our pancreatic islet reference chromatin analyses and gene expression data to the international InsPIRE consortium for the analysis of the co-localization of epigenome annotation, cis-eQTL signals, and variants influencing T2D predisposition and related glycemic traits. The integrated meta-analysis of 420 pancreatic islet samples led to the identification of candidate effector transcripts at 23 T2D loci. The human pancreatic islet is composed of several cell types, with the insulin secreting beta cells representing only 40% of the total cell population. We have optimized single-cell RNA sequencing (scRNA-seq) strategies and protocols to assess the transcriptomes of individual cells of the pancreatic islet. We have recently completed a detailed comparison of four different sample storage methods and developed an experimental protocol to process human pancreatic islets for single cell analysis. We are currently assessing the single-cell transcriptome of human islets under either low or high glucose environments to determine influences of genetic background on response to glucose stimulation. We have collected skin, muscle, and adipose biopsies from more than 300 well phenotyped and genotyped individuals with normal glucose tolerance, impaired glucose tolerance, or early onset T2D. The skin biopsies are used to generate induced pluripotent stem cell lines (iPSC), which in turn are being differentiated into tissues relevant to diabetes. To date 52 iPSC lines have been generated from T2D or normal glucose tolerant (NGT) subjects. In collaboration with the New York Stem Cell Foundation, we are developing an automated protocol for differentiating iPSC to mature insulin producing beta cells. We are performing transcriptome analysis by bulk RNA-seq and scRNA-seq, and open chromatin structure by bulk ATAC-Seq and scATAC-seq at critical stages of the differentiation process. This information will allow for the comparative analysis of the effects of genetic background on beta cell development and function. These human tissue biopsies have also yielded important insights. Analyses of the largest ever human muscle RNA-seq data set have identified many expression quantitative trait loci (eQTLs), including some that link T2D-GWAS variants to their target genes. We have also completed RNA-seq from adipose samples and are currently analyzing this gene expression data with genotypes (from DNA) to identify correlates with disease. To further expand our understanding of T2D traits, we have collected global metabolomics data on 318 muscle and 309 adipose biopsy samples as well as global metabolomics plus complex lipid analysis of plasma taken during an oral glucose tolerance test (OGTT). For the majority of these same samples, we also have RNA-seq data, and aim to integrate metabolomics data with genetic architecture, T2D related traits and eQTLs to investigate potential dynamic interactions as a result of T2D status. We have also been collecting liver samples, another diabetes-relevant tissue, and have completed the sequencing of the RNAs. Similar to pancreatic islet cell analyses, we are performing ATAC-seq to integrate chromatin structure with gene expression in liver tissue. We have performed detailed analysis of DNA methylation in a subset of these tissue samples. We have developed and implemented machine and deep learning algorithms to predict methylation values of low-quality CpG reads. These algorithms have enabled us to perform epigenome-wide association studies (EWAS) with data from sparser and more cost-effective targeted assays, as well as to identify interesting biological features associated with methylation in different tissues. Furthermore, to address the challenges of making genotype-phenotype correlations, we have integrated genomic sequence (>7 million genetic variants), gene expression and methylation data from 265 skeletal muscle biopsies with their corresponding phenotypes for eight physiological traits (height, waist, weight, waisthip ratio, body mass index, fasting serum insulin, fasting plasma glucose, and T2D). We utilized a novel approach, Mendelian randomization, to ascertain whether DNA methylation drives variation in gene expression, or the other way around. We identified gene and DNA methylation site relationships that may underlie 534 disease/quantitative traits. We have recently embarked on a project to explore the role of T2D genetic risk variants in isogenic lines. In collaboration with Dr Shuibing Chen (Weill Cornell Medicine) we are using CRISPR base-editing to generate iPSC lines harboring T2D genetic risk or non-risk variants. The edited isogenic lines will then be differentiated into beta cell lines for functional in vitro analyses including glucose stimulating insulin secretion assays and potential high throughput drug screens under various exposures/treatments. Our ultimate goal is to measure gene expression (via RNA-seq) and open chromatin structure (via ATAC-seq) from multiple tissues relevant to T2D and determine their correlation with GWAS risk alleles and DNA methylation patterns, to gain further insight into diabetes risk and possible novel avenues for prevention and treatment.