Completed an implementation of a strategy to detect the exonization of transposable elements (TEs) in human coding sequences. TEs have long been regarded as selfish or junk DNA having little or no role in the regulation or functioning of the human genome. However, over the past several years this view came to be challenged as several studies provided anecdotal as well as global evidence for the contribution of transposable elements to the regulatory and coding needs of human genes. We explored the incorporation and regulation of coding sequences donated by TEs using RNA-seq, ChIP-seq, CAGE, and DNase1 hypersensitivity data in two human hematopoietic cell-lines characterized by the ENCODE project. We compared transcriptome assembly with and without the aid of a reference transcriptome and found that the percentage of genes that incorporate transposable elements in their coding sequences is significantly greater than that obtained from the reference transcriptome assemblies using two gene models. Using a data integration approach, we demonstrated the epigenetic regulation of the TE derived coding sequences. -------------------------------------------------------------------------------------------------- Completed the development of procedures for the analysis of gene expression and next generation sequence data. We developed a novel way to analyze RNA-Seq data for detection of alternative polyadenylation (APA). Using a Poisson hidden Markov model (PHMM) to detect high and low expression states of RNA-Seq expression at the 3' untranslated region (UTR), we identified genes that have tissue specific APA between cortex (brain) and liver tissues in the human. We also showed that analyzing the 3' UTR with RNA-Seq in this manner is advantageous than using microarray profiling given the variability of the probes at the 3' end of the genes. -------------------------------------------------------------------------------------------------- Continued the collaborative support of investigators' research: 1) We employed bioinformatics strategies to predict toxicity in the rat liver from exposure to toxicants using gene expression data. We also derived of a bioinformatics strategy to differentiate modes of actions for these toxicants based on regulatory pathways enriched by differentially expressed genes (data from microarray and RNA-Seq platforms). 2) We used our custom extracting patterns and identifying co-expressed genes (EPIG) analysis tool to find genes which respond differently to the order of chemotherapeutic drug administered to rats and to identify microRNAs differentially expressed between tissues. 3) We developed an analysis workflow called PIPERS (Pipeline Informatics for Processing and Examining RNA-Seq) to detect allele-specific expression in two NIEHS/National Toxicology Program mouse strains. In addition, we developed analytical strategies to detect differentially methylated region from bisulfite sequencing data. 4) We used statistical modeling of gene expression data from humans exposed to acetaminophen in order to identify early indicators of hepatotoxicity. 5) We developed an improved support vector machines (SVMs) classifier to suit multiclass predictions based on gene expression data. 6) We integrated transcription factor binding information with genotype data and microarray gene expression data to identify population differences between transcript-regulator expression quantitative trait loci (TReQTLs).