PROJECT SUMMARY The current proposal, submitted to the special funding opportunity for Administrative Supplements for Integrative Data Analysis to Extend Research in Cancer Control and Population Sciences (NOT-CA-18-087), leverages the unique and rich data resource of the Michigan Genomics Initiative (MGI) comprising of >55,000 patients recruited at Michigan Medicine with complete array-based whole-genome genotyping data that are connected to rich longitudinal data retrieved from their electronic health records. In addition, we have access to information on behavioural data collected through a detailed questionnaire and geocoded residence information. These diverse sources of data allow us the possibility of creating novel and more accurate measures of exposures using questionnaire responses, structured and unstructured component of the electronic health record including clinical notes and the geocoded residential information. A key aspect of MGI is its enrichment for cancer outcomes. About 50% of the participants recruited so far have at least one neoplasm diagnosis in their health record. This is due to the recruitment of MGI participants from a perioperative setting. This cancer enrichment sets it apart from other biobanks and presents a unique opportunity at the University of Michigan Rogel Cancer Center for cancer control and population sciences research. Our proposal aspires to establish an analytical framework to distil summary statistics of published genome- wide association studies for cancer into polygenic risk scores, to construct known and novel cancer-related exposures from clinical free text and data mining of geocoded residential data, and to utilize them in a screen for associations across the catalogued medical phenome, the ensemble of clinical diagnoses and biomarkers that are derived from routinely ordered lab tests in MGI. Correlated co-morbidities could, if pre-symptomatic, be used to improve cancer risk prediction, while associated exposures could inform on targeted strategies for cancer prevention and risk stratification. Novel findings will be replicated (when feasible) in the publicly accessible data of the UK Biobank, a population-based study of up to half a million participants. To differentiate between pre-symptomatic and post-treatment features, we will probe the existing time- stamped data of MGI that permit retrospective insights of up to ten years into a participant?s electronic medical records. Since there is an urgent need to better communicate results from such big data exploration with a broad and multi-disciplinary audience, e.g. to inspire follow-up replication and new hypothesis generation in cancer epidemiology and furthering our understanding of susceptibility mechanisms for cancer, we will develop a webpage to allow intuitive and interactive result browsing and open sharing. The proposal showcases the advantage of harmonization, integration of large and complex databases towards impactful research and powerful discovery tools in cancer control and population sciences.