CREATION AND APPLICATION OF A DIABETES KNOWLEDGE BASE The applicant is an Instructor in Pediatrics at Harvard Medical School and an associate in bioinformatics and pediatric endocrinology at Children's Hospital, Boston. The applicant completed an NLM-funded fellowship in informatics and received a Masters Degree in Medical Informatics from MIT. Since completing his fellowship less than two years ago, he has first-authored six publications, co-authored eight publications, senior authored two publications, and co-authored a book on microarray analysis. The applicant plans to pursue a career in basic research in diabetes genomics and bioinformatics, with a joint appointment in both an academic pediatric endocrinology department and a medical informatics program. The mentor is Dr. Isaac Kohane, director of the Children's Hospital Informatics Program with a staff of 20 including 10 faculty and extensive computational resources, funded through several NIH grants. The past 10 years have led to a variety of measurements tools in molecular biology that are near comprehensive in nature. For example, RNA expression detection microarrays can provide systematic quantitative information on the expression of over 40,000 unique RNAs within cells. Yet microarrays are just one of at least 30 large-scale measurement or experimental modalities available to investigators in molecular biology. We see scientific value in being able to integrate multiple large-scale data sets from all biological modalities to address biomedical questions that could otherwise not be answered. We recognize that the full agenda of working out the details for all possible inferential processes between all near-comprehensive modalities is too large. The goal of this project is to serve as a model automated system for gathering data related to particular experimental characteristic and perform inferential operators on these data. For this application, we are focusing on a pragmatic subset. Specifically, we propose intersecting near comprehensive data sets by phenotype, and intersecting lists of significant and related genes within these data sets in an automated manner. The central hypothesis for this application is that integrating large-scale data sets across measurement modalities is a synergistic process to create new knowledge and testable hypothesis in the area of diabetes, and inferential processes involving intersection across genes can be automated.