This application addresses broad Challenge Area (06) Enabling Technologies and specific Challenge Topic 06-HG-103: Methods to sequence highly variable, repeat-rich regions of complex genomes. A remarkable 45% of our genome consists of repetitive elements - more than 40-fold the mass contribution of protein-coding sequences. Retrotransposons termed long interspersed elements (LINEs) are among the most predominant and dynamic of these. More than 500,000 LINEs, both intact 6kb elements and fragments, comprise 17% of the human genome. Their presence is intriguing because LINEs are major forces in the evolution of mammalian genomes, with the potential to significantly alter neighboring gene expression levels and mRNA structure. The youngest LINEs, known as T(a)LINEs (transcriptionally active LINEs), retain retrotransposition activity, creating significant genetic differences across human populations and inherited disease by germ-line integration, as well as somatic transforming mutations in cancer. Each of our genomes harbors about 500 T(a)LINEs, approximately 100 of which are intact, autonomous transposons capable of "copy-and-paste" retrotransposition. Finally, their insertion into both coding and noncoding regions has been associated with a wide variety of functional effects, implicating them as a potentially major source of human phenotypic diversity. There is a fundamental lack of understanding surrounding the role of retrotransposons in human disease largely because of the massive numbers of LINEs that exist in our genomes, as well as their large size. Of necessity, LINEs have been excluded from array based copy number variation studies and next generation whole genome sequencing efforts. My laboratory recently published a method to map repetitive elements in S. cerevisiae by a coupled vectorette PCR-microarray method we have termed transposon insertion profiling (TIP-Chip). We have demonstrated that this technology enables mapping of human T(a)LINEs, and propose the first major comprehensive survey of T(a)LINEs in reference DNA samples to begin to characterize this underexplored aspect of our genomes. Much of our DNA is derived from LINE retrotransposons. The youngest family of these elements, T(a)LINEs, remain mobile and are poorly characterized, major sources of genomic structural variation across human demographics. Moreover, their insertion into both coding and noncoding regions has been associated with a wide variety of functional effects, implicating them as a potentially major source of human phenotypic diversity. A newly developed method developed in this laboratory for identifying locations of T(a)LINEs will be exploited to comprehensively map these sequences in 120 reference DNA samples and generate a public database of insertional sites and frequencies. This effort will add a new dimension to our understanding of the human genome and provide the basis for future biomedical research investigating the impact of transposable elements on human health and disease.