Large-Scale Nationally Representative Patient-generated Health Data for Development of Generalizable Data Science Methodologies for Precision Public Health. Racial-ethnic minorities, socioeconomically disadvantaged, and other underserved populations experience disproportionate adverse health outcomes despite decades of research correlating social determinants (SDs) to variations in health outcomes. Many public health approaches use population averages to create ?one-size-fits-all? interventions to increase the probability of achieving the best outcomes for the average person, but are limited by population heterogeneity in number, magnitude, interplay, and amplification of SDs. Precision public health (PPH) emerged to use digital technologies (DTs) to develop interventions targeting unique needs of specific populations to improve the health and reduce disparities. Analysis of voluminous, precise, continuous, and longitudinal data generated by DTs holds great promise for PPH as smartphones, Internet of Things, and wearable sensors are becoming ubiquitous, generating data on environment, transportation, geolocation, diet, exercise, social interactions, and daily activities. These person-generated health data (PGHD) have unprecedented potential to add rich insight on everyday human behaviors to traditional health research. Though clinical PGHD applications are in early stages, there is rapid progress in development of digital indicators of health, offering virtually limitless potential. Because PGHD are typically captured outside of controlled research settings, they suffer from challenges of non-traditional data that impede their acceptance and use across the healthcare ecosystem. First, PGHD are vulnerable to input biases as users of consumer DTs are a self-selected group. Second, PGHD suffer from poor internal data quality due to high variability in completeness for reasons that are not always equally distributed across individuals (e.g., connectivity issues, battery, user forgetfulness, user error). Together, input bias and poor data quality lead to poor external validity, where analytics derived from PGHD are not generalizable to the broader population. The objective of this partnership between the RAND Corporation and Evidation Health is to improve generalizability of data science methods for PGHD, allowing for representation of all population groups, including the historically underserved. We will accomplish this goal via three aims: (i) generate PGHD from a nationally representative probability sample of Americans to understand the social distribution of user engagement with health DTs and poor sleep health; (ii) develop a methodology that characterizes missing data within PGHD and selects appropriate imputation strategies (existing and novel) optimized for reduction in model bias and socio- demographic input disparities; and, (iii) create a propensity-score based statistical weighting methodology to improve the effectiveness and applicability of methods derived from non-random, self-selected, and/or already collected PGHD in underserved populations. This work will enable future identification and application of digital indicators for health interventions that account for all populations, a critical first step for digital PPH.