Researchers will soon have access to the world's largest individual-level population database, comprising two billion individual records describing the characteristics of Americans enumerated in the U.S. censuses taken between 1790 and 2010. Most of these data are already in digital form; with support from NICHD, NIA, and NSF, they are being processed and will soon be available in an integrated format to the scientific community. The data series covers the entire enumerated population with full geographic detail. The series provide the most comprehensive view of long-run population dynamics available for any place in the world, and they have the potential to transform our understanding of processes of demographic and economic change. This proposal seeks funding to realize this potential by filling a major gap in the series. The midsection of the data series, 1900 to 1930, is missing key information on socioeconomic and demographic characteristics, including variables describing fertility, mortality, marriage, economic activities, education, immigration, and housing. Through collaboration with the world's largest genealogical firm, it is now feasible to fill this gap and complete the data series. The data expansion will make a permanent and substantial addition to the nation's statistical infrastructure and will have far-reaching implications for health-related research across the social and behavioral sciences. The project involves (1) transcription of 5.8 billion keystrokes of data describing demographic and economic characteristics of all individuals enumerated in the United States Census between 1900 and 1930; (2) evaluation of data quality through random blind verification and comparison with published census returns; (3) data cleaning, including editing and imputation of inconsistent and missing data values; (4) development of data dictionaries to classify twenty million different open-ended descriptions of occupations, industries, languages, and institutions into numeric classifications compatible with previous and subsequent census data; (5) preparation of documentation, including full descriptions of data processing methods, detailed analysis of comparability issues, and comprehensive machine-processable metadata; (6) incorporation of the additional variables into the Integrated Public Use Microdata Series (IPUMS) data access system for free dissemination to the scientific community; and (7) implementation of secure data protection and preservation policies. The project will be executed by a team of highly-experienced researchers with exceptional proficiency in large-scale data curation, integration, and dissemination. The collaboration of the Minnesota Population Center with the nation's largest producer of genealogical data allows a cost-effective use of scarce resources to develop shared infrastructure for population and health research.