In one project, we have investigator genetic factors involved in the etiology of diabetic peripheral neuropathy (DPN). The aim of this study was to conduct a systematic search for genetic variants influencing DPN risk using two well-characterized cohorts. A genome-wide association study (GWAS) testing 6.8 million single nucleotide polymorphisms was conducted among participants of the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial. Included were 4,384 white case patients with type 2 diabetes (T2D) and prevalent or incident DPN (defined as a Michigan Neuropathy Screening Instrument clinical examination score >2.0) and 784 white control subjects with T2D and no evidence of DPN at baseline or during follow-up. Replication of significant loci was sought among white subjects with T2D (791 DPN-positive case subjects and 158 DPN-negative control subjects) from the Bypass Angioplasty Revascularization Investigation in Type 2 Diabetes (BARI 2D) trial. Association between significant variants and gene expression in peripheral nerves was evaluated in the Genotype-Tissue Expression (GTEx) database. A cluster of 28 SNPs on chromosome 2q24 reached GWAS significance (P < 5 10-8) in ACCORD. The minor allele of the lead SNP (rs13417783, minor allele frequency = 0.14) decreased DPN odds by 36% (odds ratio OR 0.64, 95% CI 0.55-0.74, P = 1.9 10-9). This effect was not influenced by ACCORD treatment assignments (P for interaction = 0.6) or mediated by an association with known DPN risk factors. This locus was successfully validated in BARI 2D (OR 0.57, 95% CI 0.42-0.80, P = 9 10-4; summary P = 7.9 10-12). In GTEx, the minor, protective allele at this locus was associated with higher tibial nerve expression of an adjacent gene (SCN2A) coding for human voltage-gated sodium channel NaV1.2 (P = 9 10-4). To conclude, we have identified and successfully validated a previously unknown locus with a powerful protective effect on the development of DPN in T2D. These results may provide novel insights into DPN pathogenesis and point to a potential target for novel interventions. With my PhD student Jon Lierer, we are conduction methods develop to help understand the heterogeneity in the disease. Type 2 diabetes (T2D) is a highly heterogenous disease. This heterogeneity suggests that there may be differences in latent underlying features that are driving the disease. Developing a more refined understanding of these underlying features could lead to important insights into disease etiology, more personalized treatment approaches, and better clinical outcomes. Here we refine the use of topological data analysis (TDA) to find latent subgroups in such heterogeneous diseases. We apply our TDA workflow on a diverse cohort of patients with T2D from the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trials to 1) identify clustered subgroups of patients based on the patient topology network derived from clinical characteristics 2) test the subgroups for enrichment in clinical outcomes 3) identify the features driving subgroup membership and 4) predict clinical outcomes based on identified features. We developed a robust workflow and corresponding code to implement TDA that minimizes overfitting through internal model validation, and introduces new metric for parameter optimization. We then applied this workflow in the ACCORD data. We identified eight subgroup clusters of T2D patients, three of which contained patients who were significantly over- or under-represented in at least one clinical outcome. Patients in Cluster 2 showed increased risk for the ACCORD primary outcome and major coronary events. Patients in Cluster 5 showed increased risk for primary outcome, congestive heart failure and expanded macrovascular outcome. Patients in Cluster 6 showed decreased risk for primary outcome, cardiovascular death, congestive heart failure, expanded macrovascular outcome, and major coronary events. For the three significant clusters, we identified the clinical variables driving cluster membership. There are both methodological and applied insights gleaned from this study. Here, we extend a modified TDA method by using data-driven tuning and holdout methods for testing and validation, which is an important development for a broad range of applications. From an applied perspective, we identified 8 clusters in patients with T2D in the ACCORD trial, 3 of which are significantly enriched for clinical outcomes, and identified several variables driving membership in clusters. We also compare our approach to LASSO regression, and establish that the cluster driven variable selection is comparable, while offering a number of unique advantages. The findings provide insight into structure behind the heterogeneity in T2D and demonstrate a valid data-driven method for extracting insight from the underlying topological characteristics of data. We have ongoing projects using the ACCORD data as well. Jon Lierer is working on conduction association analyses of response to rosiglitazone, as well as the adverse outcome of weight gain. He has identified interesting potential associations that we currently looking to replicate. We are also working the genetics of the hemoglobin glycation index. HGI quantifies the interindividual variation in the propensity for glycation and is a predictor of diabetes complications and adverse effects of intensive glucose lowering. Again, we have found promising results that we are actively pursuing for replication. We also participate in several consortia related to drug response, and help others in the field replicate their results using the ACCORD cohort.