There is a consensus in the field of language development that the language that children hear from parents and caregivers is central to the development of early language skills. However, there is still a great deal we do not yet know about exactly which aspects of children?s early language environments contribute to which aspects of language skills and outcomes. This knowledge will have implications for both theories of language development as well as for potential remediation for language learners who lag behind their peers. One key to gaining a better understanding how exactly aspects of the language environment promote early language skills is the proper measurement and analysis of children?s language input. However, proper measurement is complicated by the fact that datasets that consist of the words in early language environments are larger, and unlike the sorts of datasets that psychologists are used to analyzing. These challenges are becoming more evident as new methods that capture language learning environments at scale (Gilkerson & Richards, 2008; Roy, et al., 2006; VanDam, et al., 2016) are outpacing our analytic and inferential methods to understand the distributions of words in the talk that we record (Greenwood, et al., 2011). In Montag, Jones and Smith (2017), through a series of simulations of a large database of child-directed speech, we explored various dimensions along which early language environments may hypothetically vary, and then we suggest new ways in which researchers might analyze and interpret these large datasets. The goal of the proposed project is to collect a preliminary dataset that will allow is to test the predictions made in Montag et al. The first aim of this project is to bring LENA day-long audio recording technology to UCR, develop a set of methods for efficient and accurate transcription and coding of this audio data, and to establish the plausibility of larger scale recordings of larger samples sizes. The second aim is to understand the range of individual differences that exist in young children?s early language environments. We want to better understand how the methodological and analytical choices that the researcher makes may influence the conclusions that can be drawn about the nature of early language environments. This will help us understand which aspects of the variability in early language environments predict learning outcomes, which will be informative of the mechanisms that underlie early language learning and how we can help children who struggle to learn language. To accomplish this, we will record three days of language input to 8 children between the ages of 22-28 months. We will transcribe all the speech that the child hears, as well as the child?s own speech and use this data to test theoretical predictions regarding best practices for measuring, analyzing and interpreting linguistic input. This will help us understand how our sampling and analysis techniques may constrain or illuminate important sources of variability in early language input, and the best ways to analyze large, naturalistic language datasets.