The purpose of this project is to determine the role that repetitive elements (REs) play in the biological outcome of environmental exposures. While it is known that the expression of REs changes in response to environmental exposures, mechanistic insights into the impact of REs on the biology of cells and organisms is an area of research that has not been explored in depth. We are specifically interested in studying the extent to which REs alter the expression of adjacent genes through the formation of fusion transcripts with those genes. We chose to use RNAseq to study this problem. The first task was to develop a set of bioinformatics methods that would allow for the genome-wide analysis of RE expression. This has been a major focus of the group over the past year. We initiated a collaboration Dr. Eric Nestler from Mount Sinai in New York since his group had previously demonstrated that the expression of REs is altered in the brains of mice treated with cocaine. Also, Dr. Nestler's group had previously generated genome-wide transcriptome RNA-seq data that we could use to test our new bioinformatics methods. Dr. Wang worked on several versions of a bioinformatics pipeline using the software TopHat-Fusion, which was initially developed to identify fusion transcripts resulting from chromosomal translocations in cancer. Dr. Wang has adapted and modified this tool in order to best suit our needs for the analysis of repetitive sequences. After several months of work and the adoption of very stringent criteria, the results obtained with the pipeline allowed us to identify 175K unique RE loci that were expressed in the data set, with about 1,500 of these being differentially expressed after cocaine exposure. Moreover, the pipeline included tools that allowed the identification of fusion RNAseq reads, which we defined as those containing part of a RE connected to a protein-coding exon, in 490 genes. We confirmed using real-time PCR that the fusion RNAseq reads detected for 13 randomly-selected genes (out of the 490) were indeed derived from transcripts where a RE sequence was fused with an exon from a protein coding gene (herein called a RE-fusion transcript). We confirmed that one of these genes, a rho guanine exchange factor called Arhgef10, was differentially expressed after cocaine exposure using quantitative RT-PCR. Most notably, when engineered genes expressing either the RE-fusion transcript or the corresponding wild-type transcript were individually ectopically overexpressed in the brain of mice, only the RE-fusion transcript significantly altered cocaine reward behavior. This result provided evidence that expression of a RE-fusion transcript within the brain causes a biological response within the cell. A manuscript reporting these findings has been recently submitted for publication. These experiments point out that REs have a role in the biological response of the cells to environmental exposures and suggests a mechanism of how many other environmental agents could impart their effects on the cell. While this TopHat Fusion pipeline worked effectively to identify RE-fusion transcripts, it became clear early in the year that there were many types of RE-fusion transcripts that were not being detected with this approach. To effectively eliminate false positives, we incorporated a step in the pipeline that required that the RE be fused to a non-repetitive sequence through a normal splicing event. However, this eliminated our ability to detect many fusion events, most notably those that arose from transcription that initiated within a RE and continued into the flanking non-repetitive sequence. Therefore, in an attempt to detect all possible categories of RE-fusion events within the cell, we spent considerable time over the past year developing bioinformatics workflows to achieve this result. We made various attempts on simply trying to change parameters on TopHat-Fusion but we found that the computational power required to perform the analysis using this software was beyond the capacity of the computing infrastructure at the NIH. A new pipeline was then devised using Bowtie, where we were able to estimate that about 50% of the mammalian genes are capable of expressing RE-fusion transcripts. Nevertheless, we are still finding that there are certain RE-fusion events that we are not able to detect. Therefore, over the next year we plan to collaborate with Dr Gonzalo Riadi from the Pontificia Universidad Catolica de Chile and the Universidad de Talca (Chile). Dr. Riadi is a computer scientist with specific expertise in the genome-wide analysis of repetitive elements; he is willing to work together with us to develop the tools necessary to detect all RE-fusion transcripts within a cell. We will continue to use our existing bioinformatics workflow for evaluating the differential expression of RE-fusion transcripts in samples that have been treated with environmental agents and will incorporate any new tools that are developed once they are available.