This project will develop record linkage methods necessary to create an unprecedented data resource covering the American population over seven decades. Specifically, this project will develop new strategies for placing unique protected identification keys (PIKs) on twentieth century census records and will evaluate the results and optimize the data for population and health research. These strategies will facilitate linking census, survey, and administrative records to create an integrated database allowing life-course and intergenerational analysis of health and wellbeing. Within a secure data environment, the Census Bureau assigns PIKs on many recent census and survey data which allows them to uniquely identify and link individuals across data sources for the purposes of improving data quality and program efficiency while maintaining confidentiality. This project proposes research to obtain PIK rates on 1940 census data that approach the Bureau's success on recent data. If successful, by matching 1940 cross-sectional data with recent cross sectional and panel data, this work will allow the research community to (1) construct longitudinal data on individuals over long periods of time; (2) construct longitudinal data on related individuals (siblings and parents and children) over long periods of time and (3) construct data on multiple generations of families (dynasties). Such data will be used to study fundamental issues of American society including the effects of early life living conditions on later life health outcomes and the intergenerational transfer of wealth, health and human capital. The 1940 Census is an excellent test bed for developing algorithms for assigning PIKs to earlier census data. It is the most recent decennial census for which the original manuscripts are available under the Census's 72-year rule for data release. Names and addresses as well as a host of demographic information for individuals and their household members are easily accessible through IPUMS data, giving potential information for uniquely identifying individuals with other administrative data sources. This pilot project will evaluate: (1) the overall PIK rate of the 1940 Census using algorithms developed for recent census data, including how the PIK rate varies with demographic characteristics especially age, sex and race; (2) how additional data and new methods can be used to improve the PIK rate on pre-2000 data including the use of Social Security data used to administer the OASDI program and military enlistment records; (3) the tradeoff between bias and completeness introduced by various matching methods; and (4) econometric methods to use data matched not uniquely (but to a small number of people). Findings from this study will inform future efforts to develop a data infrastructure program linking a range of data sources on individuals and families over long periods of time to study life-cycle and intergenerational issues