PROJECT SUMMARY The places people reside throughout their lives play an important role in their health and in their propensity to develop diseases such as cancer. However, the longitudinal spatiotemporal contexts of where people live are not commonly incorporated into cancer studies. Recent advances in information technology and ?big data? and associated analytic approaches have made it possible for cancer registries and researchers to capture residential histories at the population level. We propose to develop a large multi-dimensional database for cancer patients using multiple data sources to reconstruct their longitudinal residential and exposure histories, and to identify potential patient exposure profiles using data mining techniques guided by scientific evidence from the cancer epidemiology and environmental health literature. We will demonstrate the feasibility and identify advantages and challenges of such an approach by using mesothelioma as an example. We hypothesize that there are distinct spatiotemporal environmental exposure trajectories and exposure profiles among mesothelioma patients that can be identified using residential histories. Our specific aims are: Aim 1: Develop an optimal algorithm to streamline the process of compiling, cleaning, verifying, and constructing the residential histories of mesothelioma patients diagnosed between 2011 and 2015 in New York, as reported to the New York State Cancer Registry (NYSCR), utilizing multiple commercial and governmental data sources; Aim 2: Develop an optimal algorithm to streamline the process of compiling, cleaning, verifying, and constructing the exposure history associated with each mesothelioma patient's residential history by leveraging exposure proxies at the individual residence level and area-level information associated with patient's residential addresses, utilizing multiple commercial and governmental data sources; and Aim 3: Visualize the spatiotemporal dynamics of patients' residential and exposure histories, and identify predictors of their exposure profiles, using advanced data mining techniques such as cluster analysis, latent class analysis, and network analysis. The proposal is innovative in both the methods for constructing the database and the analytical methods for uncovering important exposure profiles, such as critical exposure windows, environmental clusters/hotspots, and the relative contributions of exposures across space and time. To our knowledge, no similar database exists at present. The residential data compiled in this project will be permanently stored within the NYSCR to allow future use, the first such example by any cancer registry. The identified exposure phenotypes will contribute to better understanding of the role environmental exposure plays in mesothelioma disease development. The methods developed can be tested, scaled up, replicated by other states, and adopted to other cancers and non-cancer related conditions. This life-course perspective approach holds great potential for advancing cancer research as well as for routine cancer registry surveillance.