The purpose of this project is to better understand how repetitive elements (REs) may potentially impact the biological outcome of environmental exposures. While it is known that the expression of REs change in response to environmental agents, mechanistic insights into the impact of REs on the biology of cells and organisms is an area of research that has not been explored in depth. We are specifically interested in studying the extent to which REs alter the expression of adjacent genes through the formation of fusion transcripts (FTs). We chose to use RNAseq to study this problem. We developed a robust bioinformatics pipeline to detect FTs and have been analyzing a data set associated with cocaine exposure, which was selected based on the fact that our collaborator, Dr Eric Nestler, previously demonstrated that the expression of REs is altered in the brains of mice treated with cocaine. Also, we reasoned that the identification of FTs that are responsive to cocaine could provide a link between FTs and environmental exposures. The pipeline to detect FTs utilized the software package TopHat-Fusion, which was created to identify FTs originating from chromosomal translocations in cancer. We incorporated a series of stringent criteria for detecting FTs that involved: (i) each end of the fusion read had to be mapped to one genomic locus, (ii) one end of the fusion read had to be mapped to a repeat locus and the other end had to map to an annotated exon, (iii) the junction site of the fusion had to be flanked by canonical splicing dinucleotides, and (iv) the predicted fusion event had to be detected in all independent replicates from either experimental group. In this study the experimental groups were composed of a total of 15 saline- or 15 cocaine-exposed animals, representing 3 independent replicates with 5 animals each. Using this approach we identified 466 genes that express FTs, 165 of which were modulated by cocaine exposure. About 25 independent FTs were validated using reverse-transcription PCR, and from these we selected 9 for a more in-depth analysis. We first compared the level of FT expression relative to a non-fusion isoform of the same gene using quantitative real-time PCR (qRT-PCR). This revealed that, depending upon the gene, FTs were expressed at levels that were either higher, the same, or lower than the corresponding non-fusion isoform. These 9 genes expressing FTs were also analyzed for tissue-specific as well as developmental stage-specific expression. We found that expression of FTs is tightly regulated, with some being expressed in a tissue and/or developmenttal-specific manner, and, in some cases, that differ significantly from their non-fusion counterparts. We also focused on some genes that expressed FTs were responsive to cocaine. One such gene that expressed two different FTs, Arhgef10, is a rho guanine nucleotide exchange factor. This gene ultimately regulates the actin cytoskeleton in a way that can influence cellular morphology, migration, and cytokinesis. The first FT, fusion A, involves an LTR of an Endogenous Retroviral element (ERV) located 18Kb upstream from the canonical TSS of the gene that splices to the first coding exon of the gene with the annotated ATG. The second FT, fusion B, involves a LTR of an ERV located within the second intron that splices to the third exon of the gene, which is downstream from the annotated ATG. While we predict that the fusion A transcript would give rise to a normal protein, fusion B would need to recruit a downstream ATG in order to initiate translation. The first downstream ATG is located in exon 6, which would encode an N-terminal truncated protein that is 221 amino acids shorter than the normal protein. Interestingly, previously published results form other investigators demonstrated that the N-terminus of Arhgef10 encodes a negative regulatory domain, deletion of which gives rise to a hyperactive protein. Also we found that that Arhgef10 is present within one of the few quantitative trait loci (QTL) associated with cocaine addiction. These data, coupled with our finding that the fusion B transcript was upregulated 40% upon cocaine exposure, led us to test whether this fusion gene has the capacity to influence cocaine responsive behaviors. For this purpose we prepared virus vectors expressing GFP, wild-type or the fusion B cDNAs from Arhgef10. Virus prepared from these vectors was stereotaxically injected into the brains of 22 male animals. Two days following injections mice were divided into saline- or cocaine-treated groups (N=11/group) and subjected to the conditioned placement preference protocol where animals learn to prefer a cocaine-paired environment. We found that expression of the Arhgef10 fusion B transcript significantly decreased the time animals spent in the cocaine-conditioned chamber when compared to either the wild-type or GFP controls. This result indicated that the ectopic expression specifically of the fusion B transcript blunts the reward response to cocaine. Taken together, our data clearly indicate that FTs are abundantly formed in the NAc, can be regulated in spatial- and temporal-specific manner, and can respond to an environmental cue like cocaine. We are currently analyzing other publicly available and our own RNAseq data sets (including the ones of our mitochondrial manipulations as described in Project 1) to better understand which genes have the capacity to express FTs and to establish the full biological impact of FT expression. In the past year we have also started working with our collaborator Dr Riadi in the development of a new bioinformatics pipeline to more comprehensively identify FTs across the genome. For instance, the new pipeline is being designed to address the analysis of repeats that map to more than one genomic locus, and the identification of FTs that are read through events (only events that require splicing are currently detected). This new approach relies largely on statistical analysis of genomic-wide distributions of repeats and annotated genes, which is an initial step to allow the identification of the landscape of all potential fusion events across the genome. Preliminary analyses based on DNA configurations have indicated that the range of distances between a repeat and an exon involved in a fusion based on our TopHat-Fusion analysis is 1-360Kb which is significantly different than the average distance of the nearest repeat to any intron/exon on the genome.