INTRODUCTION - Mammalian L1 elements (long interspersed repeated DNA elements, also called LINE-1) belong to the non-LTR class of retrotransposons. L1 replicates (retrotranspose) by copying their RNA transcripts into DNA which is then integrated into the genome. The ~6 kb human L1 element has four regions: a 5' untranslated (UTR) regulatory region; open reading frame 1 (ORF1) encodes an RNA binding protein; ORF2 encodes an endonuclease and reverse transcriptase; the 3' UTR contains a conserved G-rich polypurine motif. L1 elements replicate by using the 3' OH of the nicked target DNA to prime the synthesis of a cDNA copy of its transcript. L1 can also replicate other elements (e.g., SINEs such as the Alu family) by this process. As L1 activity has persisted in mammals since their emergence ~100 million years (MY) ago and the products of L1 replication are largely retained, it is not surprising that at least 30% (860 Mb) of the human genome database (hereafter called the database) was generated by L1 activity: 460 Mb is L1 DNA and 360 Mb is SINE DNA. In addition to their sheer mass, L1 elements, and their SINE offspring, can cause genetic rearrangements and inactivate or alter gene activity. We have used the tools of molecular biology and evolution and population genetics to examine the interaction between L1 and its host and how L1 activity might affect modern humans. RECENT FINDINGS: ANALYSIS OF AN ONGOING L1 AMPLIFICATION EVENT IN HUMANS - Our earlier studies showed that the human Ta L1 family (L1PA1), shown by others to cause in utero insertional mutations, arose ~5 MY ago and subsequently differentiated into two major subfamilies, Ta0 and Ta1. Ta1, which had not been described earlier, is younger than Ta0 and now accounts for at least 50 % of the Ta family. As the Ta1 family only recently emerged, the database likely lacks a significant number of the Ta1 inserts. These could be very useful as robust genetic markers for mapping medically relevant traits and for examining the history and structure of human populations. Also, a complete census of Ta1 inserts will provide a "real time" view of an L1 amplification event before it has been remodeled by selection or obscured by the accumulation of elements. We devised an unbiased method to collect Ta1 inserts from four ethnic groups: African pygmy, Caucasian Druze, Chinese, Melanesian. This work is now completed and our findings include the following: Our cloning strategy recovered 90% of theTa1 population in the four individuals. A full 40% of these inserts are not present in the database, and 93% of which are polymorphic as compared to 51% polymorphic for those in the database. Thus, the database seriously under represents both the Ta1 census in humans and its contribution to human genetic diversity. Ta1 elements are as enriched in the GC-poor regions of the genome as are ancestral L1 families. Thus, contrary to the view held by some, L1 insertions are not random with respect to GC content but have an insertional bias toward GC poor regions. Ta1 elements are also not randomly distributed between or within chromosomes. Thus, chromosome 4 is particularly hospitable to Ta1 insertions; it contains twice the number expected for its length and gene density. In contrast, ancestral families did not insert preferentially on chromosome 4. In addition, the distribution of Ta1 elements within individual chromosomes is not uniform. There is a strong tendency for the clustering of Ta1 elements, and the size of the maximal gaps between Ta1 inserts is larger, on average, than would be expected by chance. Neither the clustering nor chromosomal bias can be explained on the basis of GC content. In a separate study we determined the distribution of 120 of the polymorphic Ta1 inserts in 141 individuals and are now analyzing these data. EVOLUTIONARY DYNAMICS OF L1 IN NON-HUMAN PRIMATES - We carried out the first investigation of L1 evolution in New-World monkeys (NWM). Two regions of the ORF2 were analyzed by two different methods in three NWM species, the squirrel monkey (Saimiri sciureus), the tamarin (Saguinus oedipus) and the spider monkey (Ateles paniscus). Since these three species diverged, L1 has amplified in the Saimiri and Saguinus lineages but L1 activity seems to have been strongly reduced in the Ateles lineage. In addition, the active L1 lineage has evolved rapidly in Saimiri and Saguinus generating species-specific subfamilies. In contrast, we found no evidence for a species-specific subfamily in Ateles, a result consistent with the low L1 activity in this species for the last ~25 MY. We also found that two L1 lineages coexisted in the common ancestor of these three species for at least 6 MY, but only one of them has persisted until the present time. The coexistence of more than one L1 lineage for such a long time is very unusual in mammals where a single dominant lineage is the rule. One possible explanation for the existence of a single L1 lineage is competition between active L1 elements for a limiting host factor(s) essential for L1 replication. As the competition would presumably be reduced during periods of low L1 activity, this latter condition could favor the co-existence of multiple active L1 lineages. In fact, multiple L1 lineages seem to typify non-mammalian species (e.g., Drosophila, fish) which do not support the high level of L1 activity (or other non-LTR retrotransposon activity) possible at times in mammals. INTERACTION BETWEEN L1 AND ITS HOST - We had earlier found that the coiled-coil motif of L1 ORF 1 had undergone episodes of adaptive evolution early in hominid evolution and that this ceased during the evolution of L1 in the African apes (human, chimpanzee, gorilla). As coiled-coil domains often mediate protein-protein interaction, evolutionary change in this motif could reflect interaction of L1 with host factors. We showed that a mosaic ORF1 that contained an ancestral coiled-coil domain can function in retrotransposition in the context of an otherwise modern element. This suggests that the adaptive evolution was not a response to changes elsewhere in the L1 element but possibly to the host. We used the cytoplasmic yeast two hybrid assay to show that ORF1 proteins (including those containing an ancestral coiled-coil motif) strongly self interact and are determining the regions of ORF1 involved in this interaction. We are also using the two hybrid assay to identify clones in various expression libraries (e.g., HeLa cells, human and mouse testes, human endothelial cells) that encode for tissue specific factors that may interact with ORF1. Comparative screens with ancestral and modern ORF1 proteins will be used to identify non-specific interactions that often plague two hybrid assays. We are also developing a rapid retrotransposition assay to examine the effect of putative host factors on this process.