Diseases such as AIDS and leukemia caused by retroviruses have intensified the need to understand the mechanisms of retrovirus replication. One of our objectives is to understand how retroviral cDNAs are integrated into the genome of infected cells. Because of their similarities to retroviruses, long terminal repeat (LTR)-retrotransposons are important models for retrovirus replication. The retrotransposon under study in our laboratory is the Tf1 element of the fission yeast Schizosaccharomyces pombe. We are particularly interested in Tf1 because its integration exhibits a strong preference for pol II promoters. This choice of target sites is similar to the strong integration preferences human immunodeficiency virus 1 (HIV-1) and murine leukemina virus (MLV) have for pol II transcription units. Currently, it is not clear how these viruses recognize their target sites. We therefore study the integration of Tf1 as a model system with which we hope to uncover mechanisms general to the selection of integration sites. An understanding of the mechanisms responsible for targeted integration could lead to new approaches for antiviral therapies. A specific goal of our research is to identify the mechanism that directs integration to regions upstream of ORFs. To study insertion patterns in specific genes, a target plasmid assay was developed. Integration into the promoter of fbp1 clustered within 10 bp of a transcription enhancer called upstream activating sequence 1 (UAS1). Integration into the promoter of fbp1 depended on UAS1 sequence and Atf1p, a transcription activator that binds UAS1. To identify the key determinants responsible for targeting integration in the fbp1 promoter we conducted an extensive study of the promoter sequences. Here we report that two discrete target windows close to UAS1 were the only sequences in the promoter required for the pattern of integration. These two target windows functioned independently of each other and each one was found to be sufficient to function as an efficient target of integration. Although Atf1p is necessary for directing integration to UAS1, it may be that by activating transcription, Atf1p induces subsequent steps of transcription that are more directly responsible for directing integration. If the role of Atf1p in integration were indirect, other factors that promote fbp1 transcription would also influence integration at this promoter. However, other known transcription factors that mediate fbp1 transcription, Pcr1p, Rst2p and Tup11p/Tup12p were found not to contribute to integration at UAS1. UAS2 is an independent enhancer in the promoter of fbp1 and was not a target of integration. Nevertheless, we found UAS2 did promote efficient transcription of fbp1. In addition, we found a synthetic promoter induced by lexA fused to an activator, VP16, was not a target of Tf1 integration. These data indicate that transcription activity of a promoter is not sufficient to mediate integration. Instead, the data indicate Atf1p plays a direct and specific role in targeting integration to UAS1 of the fbp1 promoter. The result that integration of Tf1 is directed to the promoters of genes raises several key questions about the biology of Tf1 integration. Are all promoters recognized equally or is integration directed to specific sets of promoters. If specific sets of promoters are preferred targets, what distinguishes the preferred promoters from those not recognized by Tf1. To address these questions we sequenced large numbers of insertions throughout the genome of S. pombe using the new methods for ultra high throughput sequencing. To create a genome-wide profile of integration sites, we induced Tf1 from our expression plasmid and amplified libraries of insertions with ligation mediated PCR. The libraries were sequenced by 454 Life Sciences. We performed four independent transposition experiments using haploids and diploids and two different restriction enzymes (Mse I or Hpy CH4 IV) to digest the genomic DNA. All together BLAST results identified 73,125 independent Tf1 integration events. These were known to result from independent events because they were located at different positions or had different orientations. We discarded the high number of duplicate sequence reads because we did not know whether they resulted from independent integration events, cell division, or PCR amplification. The vast majority of insertions clustered within 500 bp upstream of ORFs in haploids and diploids, confirming the preference for promoter sequences observed in the target plasmid assays and the small numbers of genomic insertions previously characterized. To determine what distinguishes promoters that had high levels of insertions from the promoters that did not, we asked whether the genes associated with the targeted promoters contributed to specific classes of biological function. The results of the gene ontology analysis revealed that genes regulated by environmental stress were favored targets of integration. The strongest correlations were seen with promoters induced by osmotic and oxidative stress. This targeting of stress response genes coupled with the ability of Tf1 to regulate the expression of adjacent genes suggests the intriguing possibility that Tf1 may improve the survival of S. pombe when cells are exposed to environmental stress. With the introduction of new deep sequencing technology it is now possible to sequence many millions of transposon insertions in a single experiment. We tested whether Illumina sequencing could be used to generate a dense profile of transposon insertions that would reveal which genes are required for cell growth. For this experiment we used a haploid strain of S. pombe and Hermes, a DNA transposon from the housefly. In previous work we found that the Hermes transposon was highly active in S. pombe and that the insertions did not discriminate against ORFs. We predicted that in actively growing cultures, Hermes insertions would not be tolerated in essential ORFs. This year we induced Hermes transposition in a large culture S. pombe that was grown for 80 generations. With ligation mediated PCR and Illumina sequencing we were able to sequence 360,513 independent insertion events. On average, this represented one insertion for every 29 bp of the S. pombe genome. An analysis of integration density revealed that the ORFs largely separated into two classes, one with high numbers of insertions and another with much lower numbers. In collaboration with a group that deleted each of the genes of S. pombe, we found the ORFs with low numbers of Hermes insertion corresponded to the essential genes. The ORFs with higher integration densities were in genes classified as nonessential. These results validated integration profiling as a new method for identifying genes with essential function. Importantly, by applying specific conditions of selection during growth, this method can be adopted to identify genes that contribute to a wide variety of functions.