PROJECT SUMMARY While ambient air pollution is a well-recognized risk factor for adverse infant and childhood health outcomes, inadequate research exists on the health impacts of air emissions from emerging industries such as shale gas development (SGD). The SGD industry rapidly expanded from under 30,000 sites in 2000 to over 300,000 sites in 2016, so now approximately 17.6 million Americans now live within one mile of a drilling site. Given this substantial population exposure, there is an immediate need to determine the health risks associated with SGD air emissions and develop effective methods to evaluate and reduce exposure to emerging hazards. This study will use data science and big data techniques to integrate environmental data with health information to assess the impact of SGD on infants and children who are exposed to the SGD industry in utero. Specific health data will be derived from a large retrospective birth cohort (n=5,275,799) with full maternal addresses with linkages to birth defect and childhood cancer registries from 1996 through 2009, which corresponds to the rapid increase in Texas SGD activity. Texas is the largest shale gas producer in the country and 16% of its population (4.5 million people) lives within 1 mile of drilling, thus this is the ideal cohort to study this exposure. Aim 1 builds novel spatial-temporal exposure metrics from administrative and proprietary data sources to capture multiple pathways by which SGD may affect local populations, including specific SGD processes (e.g. production, flaring), traffic from the industry, and wind direction between homes and drilling. To date, these sources have not been used in large-scale data integration projects. By assessing policy-relevant SGD exposures, these metrics represent a substantial advancement over previous exposure assessments used in epidemiology and risk assessment studies, which can be applied to SGD as well as future threats. Aim 2 applies these spatial-temporal metrics to the geocoded birth cohort to quantify the impact of specific SGD processes and related exposures on adverse birth outcomes, birth defects, and childhood cancers. This analysis uses a unique causal-inference framework that leverages cross-disciplinary epidemiological, economic, and ontological methods. The results of the health analyses will provide further insights into which SGD exposures influence perinatal health outcomes as well as the policy guidelines that can help reduce risks for local communities. The proposed research will synthesize spatial exposure assessment methods, advance environmental health data science techniques, and develop causal-inference models to produce robust risk estimates for SGD exposures. Findings from the proposed study will provide a better understanding of how SGD is affecting local communities by providing the foundational evidence for the effects of SGD exposure on infant and children?s health. Beyond the risks associated with SGD, this project will establish novel methods to assess other local environmental hazards and help bridge multiple disciplinary gaps among epidemiology, exposure assessment, data science, and economics by demonstrating a causal inference framework not often applied in public health studies.